org.apache.pig.builtin
Class CubeDimensions
java.lang.Object
org.apache.pig.EvalFunc<DataBag>
org.apache.pig.builtin.CubeDimensions
public class CubeDimensions
- extends EvalFunc<DataBag>
Produces a DataBag with all combinations of the argument tuple members
as in a data cube. Meaning, (a, b, c) will produce the following bag:
{ (a, b, c), (null, null, null), (a, b, null), (a, null, c),
(a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by
invoking a constructor (via a DEFINE). The constructor takes a single argument,
the string you want to represent "all".
Usage goes something like this:
events = load '/logs/events' using EventLoader() as (lang, event, app_id);
cubed = foreach x generate
FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
as (lang, event, app_id),
measure;
cube = foreach (group cubed
by (lang, event, app_id) parallel $P)
generate
flatten(group) as (lang, event, app_id),
COUNT_STAR(cubed),
SUM(measure);
store cube into 'event_cube';
Note: doing this with non-algebraic aggregations on large data can result
in very slow reducers, since one of the groups is going to get all the
records in your relation.
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CubeDimensions
public CubeDimensions()
CubeDimensions
public CubeDimensions(String allMarker)
exec
public DataBag exec(Tuple tuple)
throws IOException
- Description copied from class:
EvalFunc
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Specified by:
exec
in class EvalFunc<DataBag>
- Parameters:
tuple
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
convertNullToUnknown
public static void convertNullToUnknown(Tuple tuple)
throws ExecException
- Throws:
ExecException
outputSchema
public Schema outputSchema(Schema input)
- Description copied from class:
EvalFunc
- Report the schema of the output of this UDF. Pig will make use of
this in error checking, optimization, and planning. The schema
of input data to this UDF is provided.
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
- Overrides:
outputSchema
in class EvalFunc<DataBag>
- Parameters:
input
- Schema of the input
- Returns:
- Schema of the output
Copyright © 2007-2012 The Apache Software Foundation