Skip to main content

Datamaran's fork of Pybloom adapted to Python3

Project description


dm_pybloom
=======

.. image:: https://travis-ci.org/jaybaird/python-bloomfilter.svg?branch=master
:target: https://travis-ci.org/jaybaird/python-bloomfilter

``dm_pybloom`` is a module that includes a Bloom Filter data structure along with
an implmentation of Scalable Bloom Filters as discussed in:

P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters,
(GLOBECOM 2007), IEEE, 2007.

Bloom filters are great if you understand what amount of bits you need to set
aside early to store your entire set. Scalable Bloom Filters allow your bloom
filter bits to grow as a function of false positive probability and size.

A filter is "full" when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M
is the number of bits and p is the false positive probability. When capacity
is reached a new filter is then created exponentially larger than the last
with a tighter probability of false positives and a larger number of hash
functions.

.. code-block:: python

>>> from dm_pybloom import BloomFilter
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> [f.add(x) for x in range(10)]
[False, False, False, False, False, False, False, False, False, False]
>>> all([(x in f) for x in range(10)])
True
>>> 10 in f
False
>>> 5 in f
True
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> for i in xrange(0, f.capacity):
... _ = f.add(i)
>>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18
True

>>> from dm_pybloom import ScalableBloomFilter
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> count = 10000
>>> for i in xrange(0, count):
... _ = sbf.add(i)
...
>>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18
True

# len(sbf) may not equal the entire input length. 0.01% error is well
# below the default 0.1% error threshold. As the capacity goes up, the
# error will approach 0.1%.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dm_pybloom-3.0.4.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

dm_pybloom-3.0.4-py2.py3-none-any.whl (10.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dm_pybloom-3.0.4.tar.gz.

File metadata

  • Download URL: dm_pybloom-3.0.4.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dm_pybloom-3.0.4.tar.gz
Algorithm Hash digest
SHA256 a98fbd293158b3412696d9585044fb422330b51d6056ed87acdd46158ac15bdb
MD5 7ca03bc999a97ce168b2be08d4781c19
BLAKE2b-256 6e272e1d03cc38f0cb89149a47702e97895d19ac4d29919c25526e7f7b26db12

See more details on using hashes here.

File details

Details for the file dm_pybloom-3.0.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dm_pybloom-3.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e9be7ca1d6b6e2bf37dff737b5765381da34431d6c376f4b1d089d0d83d84a92
MD5 39ec85b6b0c42cba21a7f9256951c17d
BLAKE2b-256 25650b7828b36aca661594c3765e37409c5d2ed9a523c5e8fbac055f99a73197

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page