public class HyperLogLogUtils extends Object
| Modifier and Type | Field and Description |
|---|---|
static byte[] |
MAGIC |
| Constructor and Description |
|---|
HyperLogLogUtils() |
| Modifier and Type | Method and Description |
|---|---|
static HyperLogLog |
deserializeHLL(byte[] buf)
This function deserializes the serialized hyperloglogs from a byte array.
|
static HyperLogLog |
deserializeHLL(InputStream in)
Refer serializeHLL() for format of serialization.
|
static long |
getEstimatedCountFromSerializedHLL(InputStream in)
Get estimated cardinality without deserializing HLL
|
static float |
getRelativeError(long actualCount,
long estimatedCount)
Return relative error between actual and estimated cardinality
|
static void |
serializeHLL(OutputStream out,
HyperLogLog hll)
HyperLogLog is serialized using the following format
|
public static void serializeHLL(OutputStream out, HyperLogLog hll) throws IOException
|-4 byte-|------varlong----|varint (optional)|----------|
---------------------------------------------------------
| header | estimated-count | register-length | register |
---------------------------------------------------------
4 byte header is encoded like below
3 bytes - HLL magic string to identify serialized stream
4 bits - p (number of bits to be used as register index)
1 - spare bit (not used)
3 bits - encoding (000 - sparse, 001..110 - n bit packing, 111 - no bit packing)
Followed by header are 3 fields that are required for reconstruction
of hyperloglog
Estimated count - variable length long to store last computed estimated count.
This is just for quick lookup without deserializing registers
Register length - number of entries in the register (required only for
for sparse representation. For bit-packing, the register
length can be found from p)
out - - output stream to write tohll - - hyperloglog that needs to be serializedIOExceptionpublic static HyperLogLog deserializeHLL(InputStream in) throws IOException
in - - input streamIOExceptionpublic static HyperLogLog deserializeHLL(byte[] buf)
buf - - to deserializepublic static long getEstimatedCountFromSerializedHLL(InputStream in) throws IOException
in - - serialized HLLIOExceptionpublic static float getRelativeError(long actualCount,
long estimatedCount)
actualCount - - actual countestimatedCount - - estimated countCopyright © 2019 The Apache Software Foundation. All Rights Reserved.