org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
Class POMergeJoin
java.lang.Object
org.apache.pig.impl.plan.Operator<PhyPlanVisitor>
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin
- All Implemented Interfaces:
- Serializable, Cloneable, Comparable<Operator>, Illustrable
public class POMergeJoin
- extends PhysicalOperator
This operator implements merge join algorithm to do map side joins.
Currently, only two-way joins are supported. One input of join is identified as left
and other is identified as right. Left input tuples are the input records in map.
Right tuples are read from HDFS by opening right stream.
This join doesn't support outer join.
Data is assumed to be sorted in ascending order. It will fail if data is sorted in descending order.
- See Also:
- Serialized Form
Nested Class Summary |
protected static class |
POMergeJoin.TuplesToSchemaTupleList
This is a class that extends ArrayList, making it easy to provide on the fly conversion
from Tuple to SchemaTuple. |
Fields inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
alias, illustrator, input, inputAttached, inputs, lineageTracer, outputs, parentPlan, pigLogger, requestedParallelism, res, resultType |
Fields inherited from class org.apache.pig.impl.plan.Operator |
mKey |
Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
addOriginalLocation, addOriginalLocation, attachInput, clone, cloneHelper, detachInput, getAlias, getAliasString, getIllustrator, getInputs, getLogger, getNext, getNextBigDecimal, getNextBigInteger, getNextBoolean, getNextDataBag, getNextDataByteArray, getNextDateTime, getNextDouble, getNextFloat, getNextInteger, getNextLong, getNextMap, getNextString, getOriginalLocations, getPigLogger, getReporter, getRequestedParallelism, getResultType, isAccumStarted, isAccumulative, isBlocking, isInputAttached, processInput, reset, setAccumEnd, setAccumStart, setAccumulative, setIllustrator, setInputs, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType |
POMergeJoin
public POMergeJoin(OperatorKey k,
int rp,
List<PhysicalOperator> inp,
MultiMap<PhysicalOperator,PhysicalPlan> inpPlans,
List<List<Byte>> keyTypes,
LOJoin.JOINTYPE joinType,
Schema leftInputSchema,
Schema rightInputSchema,
Schema mergedInputSchema)
throws PlanException
- Parameters:
k
- rp
- inp
- inpPlans
- there can only be 2 inputs each being a List
Ex. join A by ($0,$1), B by ($1,$2);
- Throws:
PlanException
getNextTuple
public Result getNextTuple()
throws ExecException
- Overrides:
getNextTuple
in class PhysicalOperator
- Throws:
ExecException
throwProcessingException
public void throwProcessingException(boolean withCauseException,
Exception e)
throws ExecException
- Throws:
ExecException
setupRightPipeline
public void setupRightPipeline(PhysicalPlan rightPipeline)
throws FrontendException
- Throws:
FrontendException
setRightLoaderFuncSpec
public void setRightLoaderFuncSpec(FuncSpec rightLoaderFuncSpec)
getInnerPlansOf
public List<PhysicalPlan> getInnerPlansOf(int index)
visit
public void visit(PhyPlanVisitor v)
throws VisitorException
- Description copied from class:
Operator
- Visit this node with the provided visitor. This should only be called by
the visitor class itself, never directly.
- Specified by:
visit
in class PhysicalOperator
- Parameters:
v
- Visitor to visit with.
- Throws:
VisitorException
- if the visitor has a problem.
name
public String name()
- Specified by:
name
in class Operator<PhyPlanVisitor>
supportsMultipleInputs
public boolean supportsMultipleInputs()
- Description copied from class:
Operator
- Indicates whether this operator supports multiple inputs.
- Specified by:
supportsMultipleInputs
in class Operator<PhyPlanVisitor>
- Returns:
- true if it does, otherwise false.
supportsMultipleOutputs
public boolean supportsMultipleOutputs()
- Description copied from class:
Operator
- Indicates whether this operator supports multiple outputs.
- Specified by:
supportsMultipleOutputs
in class Operator<PhyPlanVisitor>
- Returns:
- true if it does, otherwise false.
setRightInputFileName
public void setRightInputFileName(String rightInputFileName)
- Parameters:
rightInputFileName
- the rightInputFileName to set
getSignature
public String getSignature()
setSignature
public void setSignature(String signature)
setIndexFile
public void setIndexFile(String indexFile)
getIndexFile
public String getIndexFile()
illustratorMarkup
public Tuple illustratorMarkup(Object in,
Object out,
int eqClassIndex)
- Description copied from interface:
Illustrable
- input tuple mark up to be illustrate-able
- Parameters:
in
- input tupleout
- output tuple before wrapped in ExampleTupleeqClassIndex
- index into equivalence classes in illustrator
- Returns:
- tuple
getJoinType
public LOJoin.JOINTYPE getJoinType()
Copyright © 2007-2012 The Apache Software Foundation