|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.EvalFunc<DataBag>
org.apache.pig.piggybank.evaluation.Over
public class Over
Given an aggregate function, a bag, and possibly a window definition, produce output that matches SQL OVER. It is the reponsibility of the caller to have already ordered the bag as required by their operation. The aggregate, and window definition are passed in the constructor. The bag is passed to exec each time.
Usage: Over(bag, function_to_call[, window_start, window_end[, function specific args]])
bag - The bag to be called. Most functions assume this is a bag with tuples of a single field.
function_to_call - Can be one of the following:
window_start - optional - Record to start window on for the function. -1 indicates 'unbounded preceding', i.e. the beginning of the bag. A positive integer indicates that number of records before the current record. 0 indicates the current record. If not specified -1 is the default.
window_end - optional - Record to end window on for the function. -1 indicates 'unbounded following', i.e. the end of the bag. A positive integer indicates that number of records after the current record. 0 indicates teh current record. If not specified 0 is the default.
function_specific_args - maybe optional - The following functions accept require additional arguments:
Example Usage:
To do a cumulative sum:
A = load 'T' AS (si:chararray, i:int, d:long, f:float, s:chararray); C = foreach (group A by si) { Aord = order A by d; generate flatten(Stitch(Aord, Over(Aord.f, 'sum(float)'))); } D = foreach C generate s, $5;
This is equivalent to the SQL statement
select s, sum(f) over (partition by si order by d) from T;
To find the record 3 ahead of the current record, using a window between the current row and 3 records ahead and a default value of 0.
A = load 'T' AS (si:chararray, i:int, d:long, f:float, s:chararray); C = foreach (group A by si) { Aord = order A by i; generate flatten(Stitch(Aord, Over(Aord.i, 'lead', 0, 3, 3, 0))); } D = foreach C generate s, $9;
This is equivalent to the SQL statement
select s, lead(i, 3, 0) over (partition by si order by i rows between current row and 3 following) over T;
Over accepts a constructor argument specifying the name and type, colon-separated, of its return schema.
DEFINE IOver org.apache.pig.piggybank.evaluation.Over('state_rk:int'); cities = LOAD 'cities' AS (city:chararray, state:chararray, pop:int); -- Decorate each city with its population rank within the state it belongs to: ranked = FOREACH(GROUP cities BY state) { c_ord = ORDER cities BY pop DESC; GENERATE FLATTEN(Stitch(c_ord, IOver(c_ord, 'rank', -1, -1, 2))); -- beginning (-1) to end (-1) on third field (2) }; DESCRIBE ranked; -- ranked: {stitched::city: chararray,stitched::state: chararray,stitched::pop: int,stitched::state_rk: int} DUMP ranked; -- ... -- (Nashville,Tennessee,609644,2) -- (Houston,Texas,2145146,1) -- (San Antonio,Texas,1359758,2) -- (Dallas,Texas,1223229,3) -- (Austin,Texas,820611,4) -- ...
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc |
---|
EvalFunc.SchemaType |
Field Summary |
---|
Fields inherited from class org.apache.pig.EvalFunc |
---|
log, pigLogger, reporter |
Constructor Summary | |
---|---|
Over()
|
|
Over(String typespec)
|
Method Summary | |
---|---|
DataBag |
exec(Tuple input)
This callback method must be implemented by all subclasses. |
Schema |
outputSchema(Schema inputSch)
Report the schema of the output of this UDF. |
Methods inherited from class org.apache.pig.EvalFunc |
---|
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Over()
public Over(String typespec)
Method Detail |
---|
public DataBag exec(Tuple input) throws IOException
EvalFunc
exec
in class EvalFunc<DataBag>
input
- the Tuple to be processed.
IOException
public Schema outputSchema(Schema inputSch)
EvalFunc
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
outputSchema
in class EvalFunc<DataBag>
inputSch
- Schema of the input
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |