Skip to main content

Probabilistic data structures

Project description

Probabilistic Structures

Probstructs is easy to use Python wrapper for C++ library probstructs . It supports Exponential Histograms, Count Min Sketch (CM-Sketch), and Exponential Count Min Sketch (ECM-Sketch).

build status Documentation Status Version Py Versions GitHub stars

Installation

With pip:

pip install probstructs

From source:

pip install .

Classes

CountMinSketch

Count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions.

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#countminsketch

from probstructs import CountMinSketch

cm_sketch = CountMinSketch(100, 4)
cm_sketch.inc("aaa", 1)
cm_sketch.inc("bbb", 5)
cm_sketch.inc("aaa", 2)

print(cm_sketch.get("aaa"))
# 3
print(cm_sketch.get("bbb"))
# 5
print(cm_sketch.get("ccc"))
# 0


cm_sketch = CountMinSketch(width=100, depth=4)
cm_sketch.inc(key="bbb", delta=5)
print(cm_sketch.get(key="bbb"))
# 5

ExponentialHistorgram

Exponential histogram (EH) is a probabilistic data structure that serves as a frequency counter for specific elements in the last N elements from stream..

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialhistorgram

from probstructs import ExponentialHistorgram


eh = ExponentialHistorgram(1)
eh.inc(1, 1)
print(eh.get(1, 1))
# 1
eh.inc(1, 1)
print(eh.get(1, 1))
# 2
eh.inc(2, 1)
print(eh.get(1, 2))
# 1

eh = ExponentialHistorgram(window=1)
eh.inc(tick=1, delta=1)
print(eh.get(window=1, tick=1))
# 1
eh.inc(tick=1, delta=1)
print(eh.get(window=1, tick=1))
# 2
eh.inc(tick=2, delta=1)
print(eh.get(window=1, tick=2))
# 1

ExponentialCountMinSketch

Exponential count-min sketch (ECM-Sketch) combines CM-Sketch with EH to count number of different elements in the last N elements in the stream.

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialcountminsketch

from probstructs import ExponentialCountMinSketch


ecm_sketch = ExponentialCountMinSketch(100, 4, 8)

ts = 0
ecm_sketch.inc("aaa", ts, 1)
ecm_sketch.inc("bbb", ts, 4)
ecm_sketch.inc("ccc", ts, 8)

print(ecm_sketch.get("aaa", 4, ts))
# 1
print(ecm_sketch.get("bbb", 4, ts))
# 4
print(ecm_sketch.get("ccc", 4, ts))
# 8
print(ecm_sketch.get("ddd", 4, ts))
# 0

ecm_sketch = ExponentialCountMinSketch(width=100, depth=4, window=8)

ts = 0
ecm_sketch.inc(key="aaa", tick=ts, delta=1)
ecm_sketch.inc(key="bbb", tick=ts, delta=4)
ecm_sketch.inc(key="ccc", tick=ts, delta=8)

print(ecm_sketch.get(key="aaa", window=4, tick=ts))
# 1
print(ecm_sketch.get(key="bbb", window=4, tick=ts))
# 4
print(ecm_sketch.get(key="ccc", window=4, tick=ts))
# 8
print(ecm_sketch.get(key="ddd", window=4, tick=ts))
# 0

Changelog

0.2.0

  • Introduce named parameters

  • Update documentation to contain examples

0.1.0

  • Initial version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probstructs-0.2.6.tar.gz (7.2 kB view hashes)

Uploaded source

Built Distributions

probstructs-0.2.6-pp37-pypy37_pp73-win32.whl (53.6 kB view hashes)

Uploaded pp37

probstructs-0.2.6-pp36-pypy36_pp73-win32.whl (53.6 kB view hashes)

Uploaded pp36

probstructs-0.2.6-pp27-pypy_73-win32.whl (54.7 kB view hashes)

Uploaded pp27

probstructs-0.2.6-cp39-cp39-win_amd64.whl (60.7 kB view hashes)

Uploaded cp39

probstructs-0.2.6-cp39-cp39-win32.whl (54.4 kB view hashes)

Uploaded cp39

probstructs-0.2.6-cp38-cp38-win_amd64.whl (60.6 kB view hashes)

Uploaded cp38

probstructs-0.2.6-cp38-cp38-win32.whl (54.3 kB view hashes)

Uploaded cp38

probstructs-0.2.6-cp37-cp37m-win_amd64.whl (61.3 kB view hashes)

Uploaded cp37

probstructs-0.2.6-cp37-cp37m-win32.whl (55.1 kB view hashes)

Uploaded cp37

probstructs-0.2.6-cp36-cp36m-win_amd64.whl (61.3 kB view hashes)

Uploaded cp36

probstructs-0.2.6-cp36-cp36m-win32.whl (55.1 kB view hashes)

Uploaded cp36

probstructs-0.2.6-cp35-cp35m-win_amd64.whl (61.3 kB view hashes)

Uploaded cp35

probstructs-0.2.6-cp35-cp35m-win32.whl (55.1 kB view hashes)

Uploaded cp35

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page