A Python implementation of HyperLogLog. Maintained fork from python-hll
Project description
python-hll
A Python implementation of HyperLogLog whose goal is to be storage compatible with java-hll, js-hll and postgresql-hll.
NOTE: This is a fairly literal translation/port of
java-hll to Python.
Internally, bytes are represented as Java-style bytes (-128 to 127)
rather than Python-style bytes (0 to 255). Also this implementation is
quite slow: for example, in Java HLLSerializationTest
takes 12 seconds
to run while in Python test_hll_serialization
takes 1.5 hours to run
(about 400x slower).
- Runs on: Python 2.7 and 3
- Free software: MIT license
- Documentation: https://python-hll.readthedocs.io
- GitHub: https://github.com/AdRoll/python-hll
Overview
See java-hll for an overview of what HLLs are and how they work.
Usage
Hashing and adding a value to a new HLL:
from python_hll.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)
hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)
Retrieving the cardinality of an HLL:
cardinality = hll.cardinality()
Unioning two HLLs together (and retrieving the resulting cardinality):
hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5
# ... (add values to both sets) ...
hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()
Reading an HLL from a hex representation of storage specification, v1.0.0 (for example, retrieved from a PostgreSQL database):
from python_hll.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))
Writing an HLL to its hex representation of storage specification, v1.0.0 (for example, to be inserted into a PostgreSQL database):
bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))
Also see the API documentation.
Development
See Contributing for how to get started building, testing, and deploying the code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file python_hll2-1.0.0.tar.gz
.
File metadata
- Download URL: python_hll2-1.0.0.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d934fcfda26a555214d85080c73bafa10908ad96f4e19c7197b1e78f6a92b1d |
|
MD5 | e2ab38798bb414f6439bef0a8849401e |
|
BLAKE2b-256 | 0a7a7d1b3330fd305c38a165061f9b124934a8b8ab68234a836280c12637fc76 |
File details
Details for the file python_hll2-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: python_hll2-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 632146e69f7c8b83ab00f391d320d09265ad6b174b62fcdb5b6edcfb76aeb91b |
|
MD5 | 363c641304d8fb758730f66e1797f3aa |
|
BLAKE2b-256 | 363c6652970b0e8893d39b7a073cc214a2e8a10ba93f17ad1776f5b84a4be908 |