Skip to main content

A fast Count-Min Sketch data structure.

Project description

The Count–min sketch (or CM sketch) is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in many different ways. The algorithm was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan.

Count–min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets and frequency tables. Spectral Bloom filters with multi-set policy, are conceptually isomorphic to the Count-Min Sketch.

This particular implementation has been optimized for speed by utilizing numpy, using the fnv64 hash function, and making use of as much of each hash as possible.

Example Usage

>>> cms = CountMinSketch(200, 500)
>>> cms['foo']
0
>>> cms['foo'] += 5
>>> cms['foo']
5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

count_min_sketch-1.0.1.tar.gz (3.6 kB view hashes)

Uploaded Source

Built Distribution

count_min_sketch-1.0.1-py2.7.egg (7.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page