org.apache.pig.newplan.logical.relational
Class LOCube
java.lang.Object
org.apache.pig.newplan.Operator
org.apache.pig.newplan.logical.relational.LogicalRelationalOperator
org.apache.pig.newplan.logical.relational.LOCube
public class LOCube
- extends LogicalRelationalOperator
CUBE operator implementation for data cube computation.
Cube operator syntax
alias = CUBE rel BY { CUBE | ROLLUP }(col_ref) [, { CUBE | ROLLUP }(col_ref) ...];
alias - output alias
CUBE - operator
rel - input relation
BY - operator
CUBE | ROLLUP - cube or rollup operation
col_ref - column references or * or range in the schema referred by rel
The cube computation and rollup computation using UDFs
CubeDimensions
and
RollupDimensions
can be represented like below
events = LOAD '/logs/events' USING EventLoader() AS (lang, event, app_id, event_id, total);
eventcube = CUBE events BY CUBE(lang, event), ROLLUP(app_id, event_id);
result = FOREACH eventcube GENERATE FLATTEN(group) as (lang, event),
COUNT_STAR(cube), SUM(cube.total);
STORE result INTO 'cuberesult';
In the above example, CUBE(lang, event) will generate all combinations of
aggregations {(lang, event), (lang, ), ( , event), ( , )}.
For n dimensions, 2^n combinations of aggregations will be generated.
Similarly, ROLLUP(app_id, event_id) will generate aggregations from the most
detailed to the most general (grandtotal) level in the hierarchical order
like {(app_id, event_id), (app_id, ), ( , )}. For n dimensions,
n+1 combinations of aggregations will be generated.
The output of the above example query will have the following combinations of
aggregations {(lang, event, app_id, event_id), (lang, , app_id, event_id),
( , event, app_id, event_id), ( , , app_id, event_id), (lang, event, app_id, ),
(lang, , app_id, ), ( , event, app_id, ), ( , , app_id, ), (lang, event, , ),
(lang, , , ), ( , event, , ), ( , , , )}
Total number of combinations will be ( 2^n * (n+1) )
Since cube and rollup clause use null to represent "all" values of a dimension,
if the dimension values contain null values it will be converted to "unknown"
before computing cube or rollup.
Methods inherited from class org.apache.pig.newplan.logical.relational.LogicalRelationalOperator |
checkEquality, fixDuplicateUids, getAlias, getCustomPartitioner, getLineNumber, getRequestedParallelism, isPinnedOption, neverUseForRealSetSchema, pinOption, resetSchema, setAlias, setCustomPartitioner, setRequestedParallelism, setSchema, toString |
LOCube
public LOCube(LogicalPlan plan)
LOCube
public LOCube(OperatorPlan plan,
MultiMap<Integer,LogicalExpressionPlan> expressionPlans)
getSchema
public LogicalSchema getSchema()
throws FrontendException
- Description copied from class:
LogicalRelationalOperator
- Get the schema for the output of this relational operator. This does
not merely return the schema variable. If schema is not yet set, this
will attempt to construct it. Therefore it is abstract since each
operator will need to construct its schema differently.
- Specified by:
getSchema
in class LogicalRelationalOperator
- Returns:
- the schema
- Throws:
FrontendException
accept
public void accept(PlanVisitor v)
throws FrontendException
- Description copied from class:
Operator
- Accept a visitor at this node in the graph.
- Specified by:
accept
in class Operator
- Parameters:
v
- Visitor to accept.
- Throws:
FrontendException
isEqual
public boolean isEqual(Operator other)
throws FrontendException
- Description copied from class:
Operator
- This is like a shallow equals comparison.
It returns true if two operators have equivalent properties even if they are
different objects. Here properties mean equivalent plan and equivalent name.
- Specified by:
isEqual
in class Operator
- Returns:
- true if two object have equivalent properties, else false
- Throws:
FrontendException
getExpressionPlans
public MultiMap<Integer,LogicalExpressionPlan> getExpressionPlans()
setExpressionPlans
public void setExpressionPlans(MultiMap<Integer,LogicalExpressionPlan> plans)
resetUid
public void resetUid()
- Description copied from class:
LogicalRelationalOperator
- Erase all cached uid, regenerate uid when we regenerating schema.
This process currently only used in ImplicitSplitInsert, which will
insert split and invalidate some uids in plan
- Overrides:
resetUid
in class LogicalRelationalOperator
getInputs
public List<Operator> getInputs(LogicalPlan plan)
getOperations
public List<String> getOperations()
setOperations
public void setOperations(List<String> operations)
Copyright © 2007-2012 The Apache Software Foundation