Skip to main content

Python library for the HyperLogLog algorithm

Project description

python-hll

https://img.shields.io/pypi/v/python_hll.svg Documentation Status https://img.shields.io/badge/github-python--hll-yellow

A Python implementation of HyperLogLog whose goal is to be storage compatible with java-hll, js-hll and postgresql-hll.

NOTE: This is a fairly literal translation/port of java-hll to Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255). Also this implementation is quite slow: for example, in Java HLLSerializationTest takes 12 seconds to run while in Python test_hll_serialization takes 1.5 hours to run (about 400x slower).

Overview

See java-hll for an overview of what HLLs are and how they work.

Usage

Hashing and adding a value to a new HLL:

from python_hll.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)

hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)

Retrieving the cardinality of an HLL:

cardinality = hll.cardinality()

Unioning two HLLs together (and retrieving the resulting cardinality):

hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5

# ... (add values to both sets) ...

hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()

Reading an HLL from a hex representation of storage specification, v1.0.0 (for example, retrieved from a PostgreSQL database):

from python_hll.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))

Writing an HLL to its hex representation of storage specification, v1.0.0 (for example, to be inserted into a PostgreSQL database):

bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))

Also see the API documentation.

Development

See Contributing for how to get started building, testing, and deploying the code.

History

0.0.0 (2019-06-14)

  • Submitted to AdRoll HackWeek.

0.1.0 (2019-09-12)

  • First release on PyPI.

0.1.1 (2019-09-12)

  • Add missing install_requires: numpy

0.1.2 (2019-12-12)

0.1.3 (2021-01-22)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for python-hll, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size python_hll-0.1.3-py2.py3-none-any.whl (27.9 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size python_hll-0.1.3.tar.gz (2.1 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page