Datamaran's fork of Pybloom adapted to Python3
Project description
dm_pybloom
=======
.. image:: https://travis-ci.org/jaybaird/python-bloomfilter.svg?branch=master
:target: https://travis-ci.org/jaybaird/python-bloomfilter
``dm_pybloom`` is a module that includes a Bloom Filter data structure along with
an implmentation of Scalable Bloom Filters as discussed in:
P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters,
(GLOBECOM 2007), IEEE, 2007.
Bloom filters are great if you understand what amount of bits you need to set
aside early to store your entire set. Scalable Bloom Filters allow your bloom
filter bits to grow as a function of false positive probability and size.
A filter is "full" when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M
is the number of bits and p is the false positive probability. When capacity
is reached a new filter is then created exponentially larger than the last
with a tighter probability of false positives and a larger number of hash
functions.
.. code-block:: python
>>> from dm_pybloom import BloomFilter
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> [f.add(x) for x in range(10)]
[False, False, False, False, False, False, False, False, False, False]
>>> all([(x in f) for x in range(10)])
True
>>> 10 in f
False
>>> 5 in f
True
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> for i in xrange(0, f.capacity):
... _ = f.add(i)
>>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18
True
>>> from dm_pybloom import ScalableBloomFilter
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> count = 10000
>>> for i in xrange(0, count):
... _ = sbf.add(i)
...
>>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18
True
# len(sbf) may not equal the entire input length. 0.01% error is well
# below the default 0.1% error threshold. As the capacity goes up, the
# error will approach 0.1%.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dm_pybloom-3.0.4.tar.gz
(9.3 kB
view details)
Built Distribution
File details
Details for the file dm_pybloom-3.0.4.tar.gz
.
File metadata
- Download URL: dm_pybloom-3.0.4.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a98fbd293158b3412696d9585044fb422330b51d6056ed87acdd46158ac15bdb |
|
MD5 | 7ca03bc999a97ce168b2be08d4781c19 |
|
BLAKE2b-256 | 6e272e1d03cc38f0cb89149a47702e97895d19ac4d29919c25526e7f7b26db12 |
File details
Details for the file dm_pybloom-3.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: dm_pybloom-3.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9be7ca1d6b6e2bf37dff737b5765381da34431d6c376f4b1d089d0d83d84a92 |
|
MD5 | 39ec85b6b0c42cba21a7f9256951c17d |
|
BLAKE2b-256 | 25650b7828b36aca661594c3765e37409c5d2ed9a523c5e8fbac055f99a73197 |