Small library for in-memory cardinality computing.
Project description
CI status:
pyhll can be used to compute cardinality, i.e. the unique number of elements in some set using HyperLogLog. This library is a thin python wrapper around HyperLogLog implementation in https://raw.github.com/armon/hlld
Installing
pyhll can be installed via pypi:
pip install pyhll
Building
Get the source:
git clone https://github.com/blackwithwhite666/pyhll.git
Compile extension:
python setup.py build_ext --inplace
Usage
from pyhll import Cardinality c = Cardinality() c.add(b'foo') assert 1 == len(c) c.add(b'bar') assert 2 == len(c) c.add(b'bar') assert 2 == len(c) c.update([b'bar', b'buzz']) assert 3 == len(c)
Running the test suite
Use Tox to run the test suite:
tox
References
Here are some related works which we make use of:
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm : http://research.google.com/pubs/pub40671.html
HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9475
Changelog
0.2.4
Add dump and load support;
0.2.3
Add support for fluent iface;
0.2.1-0.2.2
Fix build on CentOS;
0.2.0
Add ability to union sets;
Add serialization support;
0.1.1
Exclude autoconf artifacts from sdist.
0.1.0 (initial release)
Prototype.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyhll-0.2.4.tar.gz
.
File metadata
- Download URL: pyhll-0.2.4.tar.gz
- Upload date:
- Size: 166.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70b3788e40840102dc4fa0ab450ac4e9214b5efc15cb8b93a21453431668e157 |
|
MD5 | 9326212f9b0564031b77117ad40c5ec2 |
|
BLAKE2b-256 | ac9ec658e22f5cef1a8ab2dd7578f9cf94d455178f43c7d44c47871b8c338520 |