Skip to main content

Datamaran's fork of Pybloom adapted to Python3

Project description

https://travis-ci.org/jaybaird/python-bloomfilter.svg?branch=master

pybloom is a module that includes a Bloom Filter data structure along with an implmentation of Scalable Bloom Filters as discussed in:

P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007.

Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size.

A filter is “full” when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability. When capacity is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger number of hash functions.

>>> from pybloom import BloomFilter
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> [f.add(x) for x in range(10)]
[False, False, False, False, False, False, False, False, False, False]
>>> all([(x in f) for x in range(10)])
True
>>> 10 in f
False
>>> 5 in f
True
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> for i in xrange(0, f.capacity):
...     _ = f.add(i)
>>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18
True

>>> from pybloom import ScalableBloomFilter
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> count = 10000
>>> for i in xrange(0, count):
...     _ = sbf.add(i)
...
>>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18
True

# len(sbf) may not equal the entire input length. 0.01% error is well
# below the default 0.1% error threshold. As the capacity goes up, the
# error will approach 0.1%.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dm_pybloom-3.0.3.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

dm_pybloom-3.0.3-py2.py3-none-any.whl (10.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dm_pybloom-3.0.3.tar.gz.

File metadata

  • Download URL: dm_pybloom-3.0.3.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dm_pybloom-3.0.3.tar.gz
Algorithm Hash digest
SHA256 66472c3964018735c5dbebed96c3ba0281acd5c14a9be49b82cbe2c87bc48895
MD5 445c6d3555673005fe54c3a965901946
BLAKE2b-256 f09f7a1460fb4a6bf3128f481788f5122d07904dc140b8f50c1800f26b8f314a

See more details on using hashes here.

File details

Details for the file dm_pybloom-3.0.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dm_pybloom-3.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 02513f34a47dd34d2eefada470f373ccddc0d6537bc530dccdab3769e0b1983e
MD5 6c38f11f8aadce946a6fc954595dab00
BLAKE2b-256 6f72385fbb8f8160eb25dc99a23fa87b7eded7c6a79d10a56cf02b4a387b8afa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page