A fast Count-Min Sketch data structure.
Project description
The Count–min sketch (or CM sketch) is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in many different ways. The algorithm was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan.
Count–min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets and frequency tables. Spectral Bloom filters with multi-set policy, are conceptually isomorphic to the Count-Min Sketch.
This particular implementation has been optimized for speed by utilizing numpy, using the fnv64 hash function, and making use of as much of each hash as possible.
Example Usage
>>> from count_min_sketch import CountMinSketch >>> >>> cms = CountMinSketch(500, 500) >>> cms['foo'] 0 >>> cms['foo'] += 42 >>> cms 42
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file count_min_sketch-1.0.0.tar.gz
.
File metadata
- Download URL: count_min_sketch-1.0.0.tar.gz
- Upload date:
- Size: 2.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d829886b8392abe81525fdf91575d271afcea8e6c0b157d63538d18f6cdf82fd |
|
MD5 | 90efdad0d9389372966695c57bae3f38 |
|
BLAKE2b-256 | 05c8ea8af1c83fc6feb4dbdcb4975406973701f04bf8f2f617ae966500adcb3b |
File details
Details for the file count_min_sketch-1.0.0-py2.7.egg
.
File metadata
- Download URL: count_min_sketch-1.0.0-py2.7.egg
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a35232fb7b9093f4087907237ed3150563fcdd2915e3e5d136d0941bb04a724 |
|
MD5 | 62a92e99fb689ec2385e1e52ecff39d0 |
|
BLAKE2b-256 | 0dc5819337c2c8bab09ef381733b64bcfc5428f9199f4bd8954880b75687ad53 |