Skip to main content

Probabilistic data structures in python

Project description

https://badge.fury.io/py/pyprobables.svg https://coveralls.io/repos/github/barrust/pyprobables/badge.svg https://travis-ci.org/barrust/pyprobables.svg?branch=master Documentation Status License Downloads

pyprobables is a pure-python library for probabilistic data structures. The goal is to provide the developer with a pure-python implementation of common probabilistic data-structures to use in their work.

To achieve better raw performance, it is recommended supplying an alternative hashing algorithm that has been compiled in C. This could include using the md5 and sha512 algorithms provided or installing a third party package and writing your own hashing strategy. Some options include the murmur hash mmh3 or those from the pyhash library. Each data object in pyprobables makes it easy to pass in a custom hashing function.

Read more about how to use Supplying a pre-defined, alternative hashing strategies or Defining hashing function using the provided decorators.

Installation

Pip Installation:

$ pip install pyprobables

To install from source:

To install pyprobables, simply clone the repository on GitHub, then run from the folder:

$ python setup.py install

pyprobables supports python versions 2.7 and 3.3 - 3.6

API Documentation

The documentation of is hosted on readthedocs.io

You can build the documentation locally by running:

$ pip install sphinx
$ cd docs/
$ make html

Automated Tests

To run automated tests, one must simply run the following command from the downloaded folder:

$ python setup.py test

Quickstart

Import pyprobables and setup a Bloom Filter

from probables import (BloomFilter)
blm = BloomFilter(est_elements=1000, false_positive_rate=0.05)
blm.add('google.com')
blm.check('facebook.com')  # should return False
blm.check('google.com')  # should return True

Import pyprobables and setup a Count-Min Sketch

from probables import (CountMinSketch)
cms = CountMinSketch(width=1000, depth=5)
cms.add('google.com')  # should return 1
cms.add('facebook.com', 25)  # insert 25 at once; should return 25

Import pyprobables and setup a Cuckoo Filter

from probables import (CuckooFilter)
cko = CuckooFilter(capacity=100, max_swaps=10)
cko.add('google.com')
cko.check('facebook.com')  # should return False
cko.check('google.com')  # should return True

Supplying a pre-defined, alternative hashing strategies

from probables import (BloomFilter)
from probables.hashes import (default_sha256)
blm = BloomFilter(est_elements=1000, false_positive_rate=0.05,
                  hash_function=default_sha256)
blm.add('google.com')
blm.check('facebook.com')  # should return False
blm.check('google.com')  # should return True

Defining hashing function using the provided decorators

import mmh3  # murmur hash 3 implementation (pip install mmh3)
from pyprobables.hashes import (hash_with_depth_bytes)
from pyprobables import (BloomFilter)

@hash_with_depth_bytes
def my_hash(key):
    return mmh3.hash_bytes(key)

blm = BloomFilter(est_elements=1000, false_positive_rate=0.05, hash_function=my_hash)
import mmh3  # murmur hash 3 implementation (pip install mmh3)
from pyprobables.hashes import (hash_with_depth_int)
from pyprobables import (BloomFilter)

@hash_with_depth_int
def my_hash(key, encoding='utf-8'):
    max64mod = UINT64_T_MAX + 1
    val = int(hashlib.sha512(key.encode(encoding)).hexdigest(), 16)
    return val % max64mod

blm = BloomFilter(est_elements=1000, false_positive_rate=0.05, hash_function=my_hash)

See the API documentation for other data structures available and the quickstart page for more examples!

Changelog

Please see the changelog for a list of all changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprobables-0.3.2.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprobables-0.3.2-py2.py3-none-any.whl (31.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyprobables-0.3.2.tar.gz.

File metadata

  • Download URL: pyprobables-0.3.2.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.9.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2

File hashes

Hashes for pyprobables-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9c61a98a853a708fbe8e487570e6616762dbb43e39c0df28d3b4062d45aa650c
MD5 649d27f4f3aa99a70324881ea3123c98
BLAKE2b-256 3df801e953bd4cb03ab7d9c53ccd06f77dd348587100c78d9aae0d465c87cb87

See more details on using hashes here.

File details

Details for the file pyprobables-0.3.2-py2.py3-none-any.whl.

File metadata

  • Download URL: pyprobables-0.3.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.9.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2

File hashes

Hashes for pyprobables-0.3.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7e71f82474cd65483505fc815f8fcfb9828ea9b568ed9872a7520732134d97d5
MD5 aef87ae810d95c9e9878e531c33c78b8
BLAKE2b-256 72447702813484cb24057d07c1a67d5470ac7e691f1843245d02d831927e8bdb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page