org.apache.pig.data
Class InternalSortedBag
java.lang.Object
org.apache.pig.data.DefaultAbstractBag
org.apache.pig.data.SelfSpillBag
org.apache.pig.data.SortedSpillBag
org.apache.pig.data.InternalSortedBag
- All Implemented Interfaces:
- Serializable, Comparable, Iterable<Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable
public class InternalSortedBag
- extends SortedSpillBag
An ordered collection of Tuples (possibly) with multiples. Data is
stored unsorted as it comes in, and only sorted when it is time to dump
it to a file or when the first iterator is requested. Experementation
found this to be the faster than storing it sorted to begin with.
We allow a user defined comparator, but provide a default comparator in
cases where the user doesn't specify one.
This bag is not registered with SpillableMemoryManager. It calculates
the number of tuples to hold in memory and spill pro-actively into files.
- See Also:
- Serialized Form
Method Summary |
void |
add(Tuple t)
Add a tuple to the bag. |
boolean |
isDistinct()
Find out if the bag is distinct. |
boolean |
isSorted()
Find out if the bag is sorted. |
Iterator<Tuple> |
iterator()
Get an iterator to the bag. |
long |
proactive_spill(Comparator<Tuple> comp)
Sort contents of mContents and write them to disk |
long |
spill()
Instructs an object to spill whatever it can to disk and release
references to any data structures it spills. |
Methods inherited from class org.apache.pig.data.DefaultAbstractBag |
addAll, addAll, addAll, clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markSpillableIfNecessary, markStale, readFields, reportProgress, sampleContents, size, toString, warn, write |
InternalSortedBag
public InternalSortedBag()
InternalSortedBag
public InternalSortedBag(Comparator<Tuple> comp)
InternalSortedBag
public InternalSortedBag(int bagCount,
Comparator<Tuple> comp)
InternalSortedBag
public InternalSortedBag(int bagCount,
float percent,
Comparator<Tuple> comp)
add
public void add(Tuple t)
- Description copied from class:
DefaultAbstractBag
- Add a tuple to the bag.
- Specified by:
add
in interface DataBag
- Overrides:
add
in class DefaultAbstractBag
- Parameters:
t
- tuple to add.
isSorted
public boolean isSorted()
- Description copied from interface:
DataBag
- Find out if the bag is sorted.
- Returns:
- true if this is a sorted data bag, false otherwise.
isDistinct
public boolean isDistinct()
- Description copied from interface:
DataBag
- Find out if the bag is distinct.
- Returns:
- true if the bag is a distinct bag, false otherwise.
iterator
public Iterator<Tuple> iterator()
- Description copied from interface:
DataBag
- Get an iterator to the bag. For default and distinct bags,
no particular order is guaranteed. For sorted bags the order
is guaranteed to be sorted according
to the provided comparator.
- Returns:
- tuple iterator
spill
public long spill()
- Description copied from interface:
Spillable
- Instructs an object to spill whatever it can to disk and release
references to any data structures it spills.
- Returns:
- number of objects spilled.
proactive_spill
public long proactive_spill(Comparator<Tuple> comp)
- Description copied from class:
SortedSpillBag
- Sort contents of mContents and write them to disk
- Overrides:
proactive_spill
in class SortedSpillBag
- Parameters:
comp
- Comparator to sort contents of mContents
- Returns:
- number of tuples spilled
Copyright © 2007-2012 The Apache Software Foundation