org.apache.pig.builtin
Class TOBAG
java.lang.Object
org.apache.pig.EvalFunc<DataBag>
org.apache.pig.builtin.TOBAG
public class TOBAG
- extends EvalFunc<DataBag>
This class takes a list of items and puts them into a bag
T = foreach U generate TOBAG($0, $1, $2);
It's like saying this:
T = foreach U generate {($0), ($1), ($2)}
All arguments that are not of tuple type are inserted into a tuple before
being added to the bag. This is because bag is always a bag of tuples.
Output schema:
The output schema for this udf depends on the schema of its arguments.
If all the arguments have same type and same inner
schema (for bags/tuple columns), then the udf output schema would be a bag
of tuples having a column of the type and inner-schema (if any) of the
arguments.
If the arguments are of type tuple/bag, then their inner schemas should match,
though schema field aliases may differ.
If these conditions are not met the output schema will be a bag with null
inner schema.
example 1
grunt> describe a;
a: {a0: int,a1: int}
grunt> b = foreach a generate TOBAG(a0,a1);
grunt> describe b;
b: {{int}}
example 2
grunt> describe a;
a: {a0: (x: int),a1: (x: int)}
grunt> b = foreach a generate TOBAG(a0,a1);
grunt> describe b;
b: {{(x: int)}}
example 3
grunt> describe a;
a: {a0: (x: int),a1: (y: int)}
-- note that the inner schemas have matching types but different field aliases.
-- the aliases of the first argument (a0) will be used in output schema:
grunt> b = foreach a generate TOBAG(a0,a1);
grunt> describe b;
b: {{(x: int)}}
example 4
grunt> describe a;
a: {a0: (x: int),a1: (x: chararray)}
-- here the inner schemas do not match, so output schema is not well defined:
grunt> b = foreach a generate TOBAG(a0,a1);
grunt> describe b;
b: {{NULL}}
Constructor Summary |
TOBAG()
|
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TOBAG
public TOBAG()
exec
public DataBag exec(Tuple input)
throws IOException
- Description copied from class:
EvalFunc
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Specified by:
exec
in class EvalFunc<DataBag>
- Parameters:
input
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
outputSchema
public Schema outputSchema(Schema inputSch)
- Description copied from class:
EvalFunc
- Report the schema of the output of this UDF. Pig will make use of
this in error checking, optimization, and planning. The schema
of input data to this UDF is provided.
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
- Overrides:
outputSchema
in class EvalFunc<DataBag>
- Parameters:
inputSch
- Schema of the input
- Returns:
- Schema of the output
Copyright © 2007-2012 The Apache Software Foundation