org.apache.pig.piggybank.storage.hiverc
Class HiveRCInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
org.apache.pig.piggybank.storage.hiverc.HiveRCInputFormat
public class HiveRCInputFormat
- extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
HiveRCInputFormat used by HiveColumnarLoader as the InputFormat;
Reasons for implementing a new InputFormat sub class:
- The current RCFileInputFormat uses the old InputFormat mapred interface,
and the pig load store design used the new InputFormat mapreduce classes.
- The splits are calculated by the InputFormat, HiveColumnarLoader supports
date partitions, the filtering is done here.
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter |
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
Initialises an instance of HiveRCRecordReader. |
protected long |
getFormatMinSplitSize()
The input split size should never be smaller than the
RCFile.SYNC_INTERVAL |
protected List<org.apache.hadoop.fs.FileStatus> |
listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)
|
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HiveRCInputFormat
public HiveRCInputFormat()
HiveRCInputFormat
public HiveRCInputFormat(String signature)
listStatus
protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
- Overrides:
listStatus
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
- Throws:
IOException
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
throws IOException,
InterruptedException
- Initialises an instance of HiveRCRecordReader.
- Specified by:
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
- Throws:
IOException
InterruptedException
getFormatMinSplitSize
protected long getFormatMinSplitSize()
- The input split size should never be smaller than the
RCFile.SYNC_INTERVAL
- Overrides:
getFormatMinSplitSize
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
Copyright © 2007-2012 The Apache Software Foundation