|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.EvalFunc<T>
org.apache.pig.builtin.BuildBloomBase<DataByteArray>
org.apache.pig.builtin.BuildBloom
public class BuildBloom
Build a bloom filter for use later in Bloom. This UDF is intended to run
in a group all job. For example:
define bb BuildBloom('jenkins', '100', '0.1');
A = load 'foo' as (x, y);
B = group A all;
C = foreach B generate BuildBloom(A.x);
store C into 'mybloom';
The bloom filter can be on multiple keys by passing more than one field
(or the entire bag) to BuildBloom.
The resulting file can then be used in a Bloom filter as:
define bloom Bloom(mybloom);
A = load 'foo' as (x, y);
B = load 'bar' as (z);
C = filter B by Bloom(z);
D = join C by z, A by x;
It uses BloomFilter
.
Nested Class Summary | |
---|---|
static class |
BuildBloom.Final
|
static class |
BuildBloom.Initial
|
static class |
BuildBloom.Intermediate
|
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc |
---|
EvalFunc.SchemaType |
Field Summary |
---|
Fields inherited from class org.apache.pig.builtin.BuildBloomBase |
---|
filter, hType, numHash, vSize |
Fields inherited from class org.apache.pig.EvalFunc |
---|
log, pigLogger, reporter, returnType |
Constructor Summary | |
---|---|
BuildBloom(String hashType,
String numElements,
String desiredFalsePositive)
Construct a Bloom filter based on expected number of elements and desired accuracy. |
|
BuildBloom(String hashType,
String mode,
String vectorSize,
String nbHash)
Build a bloom filter of fixed size and number of hash functions. |
Method Summary | |
---|---|
DataByteArray |
exec(Tuple input)
This callback method must be implemented by all subclasses. |
String |
getFinal()
Get the final function. |
String |
getInitial()
Get the initial function. |
String |
getIntermed()
Get the intermediate function. |
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF. |
Methods inherited from class org.apache.pig.builtin.BuildBloomBase |
---|
bloomIn, bloomOr, bloomOut |
Methods inherited from class org.apache.pig.EvalFunc |
---|
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BuildBloom(String hashType, String mode, String vectorSize, String nbHash)
hashType
- type of the hashing function (see
Hash
).mode
- Will be ignored, though by convention it should be
"fixed" or "fixedsize"vectorSize
- The vector size of this filter.nbHash
- The number of hash functions to consider.public BuildBloom(String hashType, String numElements, String desiredFalsePositive)
hashType
- type of the hashing function (see
Hash
).numElements
- The number of distinct elements expected to be
placed in this filter.desiredFalsePositive
- the acceptable rate of false positives.
This should be a floating point value between 0 and 1.0, where 1.0
would be 100% (ie, a totally useless filter).Method Detail |
---|
public DataByteArray exec(Tuple input) throws IOException
EvalFunc
exec
in class EvalFunc<DataByteArray>
input
- the Tuple to be processed.
IOException
public String getInitial()
Algebraic
getInitial
in interface Algebraic
public String getIntermed()
Algebraic
getIntermed
in interface Algebraic
public String getFinal()
Algebraic
getFinal
in interface Algebraic
public Schema outputSchema(Schema input)
EvalFunc
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
outputSchema
in class EvalFunc<DataByteArray>
input
- Schema of the input
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |