HiveRCInputFormat (Pig 0.13.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pig.piggybank.storage.hiverc
Class HiveRCInputFormat

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
          org.apache.pig.piggybank.storage.hiverc.HiveRCInputFormat

public class HiveRCInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>

HiveRCInputFormat used by HiveColumnarLoader as the InputFormat;

Reasons for implementing a new InputFormat sub class:

The current RCFileInputFormat uses the old InputFormat mapred interface, and the pig load store design used the new InputFormat mapreduce classes.
The splits are calculated by the InputFormat, HiveColumnarLoader supports date partitions, the filtering is done here.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter`

Constructor Summary
`HiveRCInputFormat()`
`HiveRCInputFormat(String signature)`

Method Summary
`org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext ctx)` Initialises an instance of HiveRCRecordReader.
`protected long`	`getFormatMinSplitSize()` The input split size should never be smaller than the RCFile.SYNC_INTERVAL
`protected List<org.apache.hadoop.fs.FileStatus>`	`listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)`

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

HiveRCInputFormat

public HiveRCInputFormat()

HiveRCInputFormat

public HiveRCInputFormat(String signature)

Method Detail

listStatus

protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                    throws IOException

Overrides:: listStatus in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>

Throws:: IOException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                                                   org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
                                                                                                                                                            throws IOException,
                                                                                                                                                                   InterruptedException

Initialises an instance of HiveRCRecordReader.

Specified by:: createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>

Throws:: IOException; InterruptedException

getFormatMinSplitSize

protected long getFormatMinSplitSize()

The input split size should never be smaller than the RCFile.SYNC_INTERVAL

Overrides:: getFormatMinSplitSize in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable>

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pig.piggybank.storage.hiverc Class HiveRCInputFormat

HiveRCInputFormat

HiveRCInputFormat

listStatus

createRecordReader

getFormatMinSplitSize

org.apache.pig.piggybank.storage.hiverc
Class HiveRCInputFormat