public abstract class FileInputFormat<K,V> extends Object implements InputFormat<K,V>
InputFormat.
FileInputFormat is the base class for all file-based
InputFormats. This provides a generic implementation of
getSplits(JobConf, int).
Subclasses of FileInputFormat can also override the
isSplitable(FileSystem, Path) method to ensure input-files are
not split-up and are processed as a whole by Mappers.
| 限定符和类型 | 类和说明 |
|---|---|
static class |
FileInputFormat.Counter |
| 限定符和类型 | 字段和说明 |
|---|---|
static org.apache.commons.logging.Log |
LOG |
| 构造器和说明 |
|---|
FileInputFormat() |
| 限定符和类型 | 方法和说明 |
|---|---|
static void |
addInputPath(JobConf conf,
Path path)
Add a
Path to the list of inputs for the map-reduce job. |
static void |
addInputPaths(JobConf conf,
String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for
the map-reduce job.
|
protected long |
computeSplitSize(long goalSize,
long minSize,
long blockSize) |
protected int |
getBlockIndex(BlockLocation[] blkLocations,
long offset) |
static PathFilter |
getInputPathFilter(JobConf conf)
Get a PathFilter instance of the filter set for the input paths.
|
static Path[] |
getInputPaths(JobConf conf)
Get the list of input
Paths for the map-reduce job. |
abstract RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
Get the
RecordReader for the given InputSplit. |
protected String[] |
getSplitHosts(BlockLocation[] blkLocations,
long offset,
long splitSize,
NetworkTopology clusterMap)
This function identifies and returns the hosts that contribute
most for a given split.
|
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Splits files returned by
listStatus(JobConf) when
they're too big. |
protected boolean |
isSplitable(FileSystem fs,
Path filename)
Is the given filename splitable?
|
protected FileStatus[] |
listStatus(JobConf job)
List input directories.
|
static void |
setInputPathFilter(JobConf conf,
Class<? extends PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job.
|
static void |
setInputPaths(JobConf conf,
Path... inputPaths)
Set the array of
Paths as the list of inputs
for the map-reduce job. |
static void |
setInputPaths(JobConf conf,
String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs
for the map-reduce job.
|
protected void |
setMinSplitSize(long minSplitSize) |
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(FileSystem fs, Path filename)
FileInputFormat implementations can override this and return
false to ensure that individual input files are never split-up
so that Mappers process entire files.fs - the file system that the file is onfilename - the file name to checkpublic abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
InputFormatRecordReader for the given InputSplit.
It is the responsibility of the RecordReader to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
getRecordReader 在接口中 InputFormat<K,V>split - the InputSplitjob - the job that this split belongs toRecordReaderIOExceptionpublic static void setInputPathFilter(JobConf conf, Class<? extends PathFilter> filter)
filter - the PathFilter class use for filtering the input paths.public static PathFilter getInputPathFilter(JobConf conf)
protected FileStatus[] listStatus(JobConf job) throws IOException
job - the job to list input paths forIOException - if zero items.public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
listStatus(JobConf) when
they're too big.getSplits 在接口中 InputFormat<K,V>job - job configuration.numSplits - the desired number of splits, a hint.InputSplits for the job.IOExceptionprotected long computeSplitSize(long goalSize,
long minSize,
long blockSize)
protected int getBlockIndex(BlockLocation[] blkLocations, long offset)
public static void setInputPaths(JobConf conf, String commaSeparatedPaths)
conf - Configuration of the jobcommaSeparatedPaths - Comma separated paths to be set as
the list of inputs for the map-reduce job.public static void addInputPaths(JobConf conf, String commaSeparatedPaths)
conf - The configuration of the jobcommaSeparatedPaths - Comma separated paths to be added to
the list of inputs for the map-reduce job.public static void setInputPaths(JobConf conf, Path... inputPaths)
Paths as the list of inputs
for the map-reduce job.conf - Configuration of the job.inputPaths - the Paths of the input directories/files
for the map-reduce job.public static void addInputPath(JobConf conf, Path path)
Path to the list of inputs for the map-reduce job.conf - The configuration of the jobpath - Path to be added to the list of inputs for
the map-reduce job.public static Path[] getInputPaths(JobConf conf)
Paths for the map-reduce job.conf - The configuration of the jobPaths for the map-reduce job.protected String[] getSplitHosts(BlockLocation[] blkLocations, long offset, long splitSize, NetworkTopology clusterMap) throws IOException
blkLocations - The list of block locationsoffset - splitSize - IOExceptionCopyright © 2009 The Apache Software Foundation