public class ChainReducer extends Object implements Reducer
[MAP+ / REDUCE MAP*]. And
immediate benefit of this pattern is a dramatic reduction in disk IO.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainReducer, this is done by the setReducer or the addMapper for the last
element in the chain.
ChainReducer usage pattern:
...
conf.setJobName("chain");
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
JobConf mapAConf = new JobConf(false);
...
ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class,
Text.class, Text.class, true, mapAConf);
JobConf mapBConf = new JobConf(false);
...
ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class,
LongWritable.class, Text.class, false, mapBConf);
JobConf reduceConf = new JobConf(false);
...
ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class,
Text.class, Text.class, true, reduceConf);
ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class,
LongWritable.class, Text.class, false, null);
ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class,
LongWritable.class, LongWritable.class, true, null);
FileInputFormat.setInputPaths(conf, inDir);
FileOutputFormat.setOutputPath(conf, outDir);
...
JobClient jc = new JobClient(conf);
RunningJob job = jc.submitJob(conf);
...
| 构造器和说明 |
|---|
ChainReducer()
Constructor.
|
| 限定符和类型 | 方法和说明 |
|---|---|
static <K1,V1,K2,V2> |
addMapper(JobConf job,
Class<? extends Mapper<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
Adds a Mapper class to the chain job's JobConf.
|
void |
close()
Closes the ChainReducer, the Reducer and all the Mappers in the chain.
|
void |
configure(JobConf job)
Configures the ChainReducer, the Reducer and all the Mappers in the chain.
|
void |
reduce(Object key,
Iterator values,
OutputCollector output,
Reporter reporter)
Chains the
reduce(...) |
static <K1,V1,K2,V2> |
setReducer(JobConf job,
Class<? extends Reducer<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf reducerConf)
Sets the Reducer class to the chain job's JobConf.
|
public static <K1,V1,K2,V2> void setReducer(JobConf job, Class<? extends Reducer<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf reducerConf)
reducerConf, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainReducer, this is done by the setReducer or the addMapper for the last
element in the chain.job - job's JobConf to add the Reducer class.klass - the Reducer class to add.inputKeyClass - reducer input key class.inputValueClass - reducer input value class.outputKeyClass - reducer output key class.outputValueClass - reducer output value class.byValue - indicates if key/values should be passed by value
to the next Mapper in the chain, if any.reducerConf - a JobConf with the configuration for the Reducer
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults) constructor with FALSE.public static <K1,V1,K2,V2> void addMapper(JobConf job, Class<? extends Mapper<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf mapperConf)
mapperConf, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain
.job - chain job's JobConf to add the Mapper class.klass - the Mapper class to add.inputKeyClass - mapper input key class.inputValueClass - mapper input value class.outputKeyClass - mapper output key class.outputValueClass - mapper output value class.byValue - indicates if key/values should be passed by value
to the next Mapper in the chain, if any.mapperConf - a JobConf with the configuration for the Mapper
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults) constructor with FALSE.public void configure(JobConf job)
super.configure(...) should be
invoked at the beginning of the overwriter method.configure 在接口中 JobConfigurablejob - the configurationpublic void reduce(Object key, Iterator values, OutputCollector output, Reporter reporter) throws IOException
reduce(...) method of the Reducer with the
map(...) methods of the Mappers in the chain.reduce 在接口中 Reducerkey - the key.values - the list of values to reduce.output - to collect keys and combined values.reporter - facility to report progress.IOExceptionpublic void close()
throws IOException
super.close() should be
invoked at the end of the overwriter method.close 在接口中 Closeableclose 在接口中 AutoCloseableIOExceptionCopyright © 2009 The Apache Software Foundation