|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.LoadFunc
org.apache.pig.FileInputLoadFunc
org.apache.pig.builtin.PigStorage
org.apache.pig.piggybank.storage.IndexedStorage
public class IndexedStorage
IndexedStorage
is a form of PigStorage
that supports a
per record seek. IndexedStorage
creates a separate (hidden) index file for
every data file that is written. The format of the index file is:
| Header | | Index Body | | Footer |The Header contains the list of record indices (field numbers) that represent index keys. The Index Body contains a
Tuple
for each record in the data.
The fields of the Tuple
are:
Tuple
Tuple
in the index. Tuple
in the index. IndexStorage
implements IndexableLoadFunc
and
can be used as the 'right table' in a PIG 'merge' or 'merge-sparse' join.
IndexStorage
does not require the data to be globally partitioned & sorted
by index keys. Each partition (separate index) must be locally sorted.
Also note IndexStorage is a loader to demonstrate "merge-sparse" join.
Nested Class Summary | |
---|---|
static class |
IndexedStorage.IndexedStorageInputFormat
Internal InputFormat class |
static class |
IndexedStorage.IndexedStorageOutputFormat
Internal OutputFormat class |
static class |
IndexedStorage.IndexManager
IndexManager manages the index file (both writing and reading)
It keeps track of the last index read during reading. |
Nested classes/interfaces inherited from interface org.apache.pig.LoadPushDown |
---|
LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse |
Field Summary | |
---|---|
protected int |
currentReaderIndexStart
Index into the the list of readers to the current reader. |
protected byte |
fieldDelimiter
Delimiter to use between fields |
protected int[] |
offsetsToIndexKeys
Offsets to index keys in tuple |
protected Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> |
readerComparator
Comparator used to compare key tuples. |
protected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] |
readers
List of record readers. |
Fields inherited from class org.apache.pig.builtin.PigStorage |
---|
caster, in, mLog, mRequiredColumns, schema, signature, writer |
Constructor Summary | |
---|---|
IndexedStorage(String delimiter,
String offsetsToIndexKeys)
Constructs a Pig Storer that uses specified regex as a field delimiter. |
Method Summary | |
---|---|
void |
close()
A method called by the Pig runtime to give an opportunity for implementations to perform cleanup actions like closing the underlying input stream. |
org.apache.hadoop.mapreduce.InputFormat |
getInputFormat()
This will be called during planning on the front end. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
org.apache.hadoop.mapreduce.OutputFormat |
getOutputFormat()
Return the OutputFormat associated with StoreFuncInterface. |
void |
initialize(org.apache.hadoop.conf.Configuration conf)
IndexableLoadFunc interface implementation |
void |
seekNear(Tuple keys)
This method is called by the Pig runtime to indicate to the LoadFunc to position its underlying input stream near the keys supplied as the argument. |
Methods inherited from class org.apache.pig.builtin.PigStorage |
---|
checkSchema, cleanupOnFailure, cleanupOnSuccess, cleanupOutput, equals, equals, getFeatures, getPartitionKeys, getSchema, getStatistics, hashCode, prepareToRead, prepareToWrite, pushProjection, putNext, readField, relToAbsPathForStoreLocation, setLocation, setPartitionFilter, setStoreFuncUDFContextSignature, setStoreLocation, setUDFContextSignature, shouldOverwrite, storeSchema, storeStatistics |
Methods inherited from class org.apache.pig.FileInputLoadFunc |
---|
getSplitComparable |
Methods inherited from class org.apache.pig.LoadFunc |
---|
getAbsolutePath, getLoadCaster, getPathStrings, join, relativeToAbsolutePath, warn |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] readers
protected int currentReaderIndexStart
protected byte fieldDelimiter
protected final int[] offsetsToIndexKeys
protected Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> readerComparator
Constructor Detail |
---|
public IndexedStorage(String delimiter, String offsetsToIndexKeys)
delimiter
- - field delimiter to useoffsetsToIndexKeys
- - list of offset into Tuple for index keys (comma separated)Method Detail |
---|
public org.apache.hadoop.mapreduce.OutputFormat getOutputFormat()
StoreFuncInterface
getOutputFormat
in interface StoreFuncInterface
getOutputFormat
in class PigStorage
OutputFormat
associated with StoreFuncInterfacepublic org.apache.hadoop.mapreduce.InputFormat getInputFormat()
LoadFunc
getInputFormat
in class PigStorage
public Tuple getNext() throws IOException
LoadFunc
getNext
in class PigStorage
IOException
- if there is an exception while retrieving the next
tuplepublic void initialize(org.apache.hadoop.conf.Configuration conf) throws IOException
initialize
in interface IndexableLoadFunc
conf
- The job configuration object
IOException
public void seekNear(Tuple keys) throws IOException
IndexableLoadFunc
seekNear
in interface IndexableLoadFunc
keys
- Tuple with join keys (which are a prefix of the sort
keys of the input data). For example if the data is sorted on
columns in position 2,4,5 any of the following Tuples are
valid as an argument value:
(fieldAt(2))
(fieldAt(2), fieldAt(4))
(fieldAt(2), fieldAt(4), fieldAt(5))
The following are some invalid cases:
(fieldAt(4))
(fieldAt(2), fieldAt(5))
(fieldAt(4), fieldAt(5))
IOException
- When the loadFunc is unable to position
to the required point in its input streampublic void close() throws IOException
IndexableLoadFunc
close
in interface IndexableLoadFunc
IOException
- if the loadfunc is unable to perform
its close actions.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |