Skip to main content

Datamaran's fork of Pybloom adapted to Python3

Project description

https://travis-ci.org/jaybaird/python-bloomfilter.svg?branch=master

pybloom is a module that includes a Bloom Filter data structure along with an implmentation of Scalable Bloom Filters as discussed in:

P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007.

Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size.

A filter is “full” when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability. When capacity is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger number of hash functions.

>>> from pybloom import BloomFilter
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> [f.add(x) for x in range(10)]
[False, False, False, False, False, False, False, False, False, False]
>>> all([(x in f) for x in range(10)])
True
>>> 10 in f
False
>>> 5 in f
True
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> for i in xrange(0, f.capacity):
...     _ = f.add(i)
>>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18
True

>>> from pybloom import ScalableBloomFilter
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> count = 10000
>>> for i in xrange(0, count):
...     _ = sbf.add(i)
...
>>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18
True

# len(sbf) may not equal the entire input length. 0.01% error is well
# below the default 0.1% error threshold. As the capacity goes up, the
# error will approach 0.1%.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dm_pybloom-3.0.2.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

dm_pybloom-3.0.2-py2.py3-none-any.whl (10.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dm_pybloom-3.0.2.tar.gz.

File metadata

  • Download URL: dm_pybloom-3.0.2.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dm_pybloom-3.0.2.tar.gz
Algorithm Hash digest
SHA256 bc7e1e1a25fb63fa172551771786e0263cdc03870e2d3b336bcd1487f1688b3e
MD5 f7a81377223c93d72c7faddd7363c671
BLAKE2b-256 e0b64f2461ce38eb890219407116dbe32b8db4727198c7259937d746e0ee5ff1

See more details on using hashes here.

File details

Details for the file dm_pybloom-3.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dm_pybloom-3.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fb31e529dc4dfa25bc1fff53fbadd1ef03933191c1e2d1e53d976abf86e300e9
MD5 bbc8911494e80a62eb283797c6b6f708
BLAKE2b-256 288b72ca0e69e26628c9138b5a10ad2f44dd2c261743155c07840c1c62784467

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page