Skip to main content

A Python implementation of HyperLogLog. Maintained fork from python-hll

Project description

python-hll

image

Documentation Status

image

A Python implementation of HyperLogLog whose goal is to be storage compatible with java-hll, js-hll and postgresql-hll.

NOTE: This is a fairly literal translation/port of java-hll to Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255). Also this implementation is quite slow: for example, in Java HLLSerializationTest takes 12 seconds to run while in Python test_hll_serialization takes 1.5 hours to run (about 400x slower).

Overview

See java-hll for an overview of what HLLs are and how they work.

Usage

Hashing and adding a value to a new HLL:

from python_hll2.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)

hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)

Retrieving the cardinality of an HLL:

cardinality = hll.cardinality()

Unioning two HLLs together (and retrieving the resulting cardinality):

hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5

# ... (add values to both sets) ...

hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()

Reading an HLL from a hex representation of storage specification, v1.0.0 (for example, retrieved from a PostgreSQL database):

from python_hll2.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))

Writing an HLL to its hex representation of storage specification, v1.0.0 (for example, to be inserted into a PostgreSQL database):

bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))

Also see the API documentation.

Development

See Contributing for how to get started building, testing, and deploying the code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_hll2-2.0.2.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_hll2-2.0.2-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file python_hll2-2.0.2.tar.gz.

File metadata

  • Download URL: python_hll2-2.0.2.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for python_hll2-2.0.2.tar.gz
Algorithm Hash digest
SHA256 85dd4b538b98e08c6491d71c7ec19f59df4d1583ce8165b23586e18f94c380b4
MD5 36b511649229c92023412a9c6be3e9e5
BLAKE2b-256 13beb477f5f28b37104e2a57586c2a168d2c722a0aee898b34c794e9f71b8c5c

See more details on using hashes here.

File details

Details for the file python_hll2-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: python_hll2-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for python_hll2-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01f2e782c5eba8b50b9dbd5bfcd6bac6ff678b530b4580e4dbfac420abc27472
MD5 14c55f9debd45fbb602b5b67534c31bc
BLAKE2b-256 ef3fcc90d19cbc64729b0b1c087b961d07ce5f7e0bf1372be5cdbbca22176005

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page