org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class CombinerOptimizer
java.lang.Object
org.apache.pig.impl.plan.PlanVisitor<MapReduceOper,MROperPlan>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MROpPlanVisitor
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer
public class CombinerOptimizer
- extends MROpPlanVisitor
Optimize map reduce plans to use the combiner where possible.
Algebriac functions and distinct in nested plan of a foreach are partially
computed in the map and combine phase.
A new foreach statement with initial and intermediate forms of algebraic
functions are added to map and combine plans respectively.
If bag portion of group-by result is projected or a non algebraic
expression/udf has bag as input, combiner will not be used. This is because
the use of combiner in such case is likely to degrade performance
as there will not be much reduction in data size in combine stage to
offset the cost of the additional number of times (de)serialization is done.
Major areas for enhancement:
1. use of combiner in cogroup
2. queries with order-by, limit or sort in a nested foreach after group-by
3. case where group-by is followed by filter that has algebraic expression
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CombinerOptimizer
public CombinerOptimizer(MROperPlan plan,
boolean doMapAgg)
CombinerOptimizer
public CombinerOptimizer(MROperPlan plan,
boolean doMapAgg,
CompilationMessageCollector messageCollector)
getMessageCollector
public CompilationMessageCollector getMessageCollector()
visitMROp
public void visitMROp(MapReduceOper mr)
throws VisitorException
- Overrides:
visitMROp
in class MROpPlanVisitor
- Throws:
VisitorException
Copyright © 2007-2012 The Apache Software Foundation