Skip to main content

A Python implementation of HyperLogLog. Maintained fork from python-hll

Project description

python-hll

image

Documentation Status

image

A Python implementation of HyperLogLog whose goal is to be storage compatible with java-hll, js-hll and postgresql-hll.

NOTE: This is a fairly literal translation/port of java-hll to Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255). Also this implementation is quite slow: for example, in Java HLLSerializationTest takes 12 seconds to run while in Python test_hll_serialization takes 1.5 hours to run (about 400x slower).

Overview

See java-hll for an overview of what HLLs are and how they work.

Usage

Hashing and adding a value to a new HLL:

from python_hll.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)

hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)

Retrieving the cardinality of an HLL:

cardinality = hll.cardinality()

Unioning two HLLs together (and retrieving the resulting cardinality):

hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5

# ... (add values to both sets) ...

hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()

Reading an HLL from a hex representation of storage specification, v1.0.0 (for example, retrieved from a PostgreSQL database):

from python_hll.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))

Writing an HLL to its hex representation of storage specification, v1.0.0 (for example, to be inserted into a PostgreSQL database):

bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))

Also see the API documentation.

Development

See Contributing for how to get started building, testing, and deploying the code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_hll2-1.0.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

python_hll2-1.0.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file python_hll2-1.0.0.tar.gz.

File metadata

  • Download URL: python_hll2-1.0.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for python_hll2-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2d934fcfda26a555214d85080c73bafa10908ad96f4e19c7197b1e78f6a92b1d
MD5 e2ab38798bb414f6439bef0a8849401e
BLAKE2b-256 0a7a7d1b3330fd305c38a165061f9b124934a8b8ab68234a836280c12637fc76

See more details on using hashes here.

File details

Details for the file python_hll2-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: python_hll2-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for python_hll2-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 632146e69f7c8b83ab00f391d320d09265ad6b174b62fcdb5b6edcfb76aeb91b
MD5 363c641304d8fb758730f66e1797f3aa
BLAKE2b-256 363c6652970b0e8893d39b7a073cc214a2e8a10ba93f17ad1776f5b84a4be908

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page