Scalable bloom filter using different backends written in Python
Project description
python-bloomfilter
Scalable bloom filter using different backends written in Python. Current version only works with Python 3.
Installation
pip install BloomFilterPy
Backends
Currently, BloomFilterPy has the following backends available: numpy
, bitarray
and redis
. The first two are recommended when the expected number of elements in the filter fit in memory. Redis backend is the preferred when:
- Expect huge amount of data in the filter that it doesn't fit in memory.
- You want a distributed filter available (i.e. more than one machine).
Usage & API
BloomFilterPy implements a common API regardless of the backend used. Every backend extends BaseBackend
class that implements the common API. In turn, this base class extends the default set
class of Python, but just add
operation is properly handled.
BloomFilter
class
max_number_of_element_expected
: Size of filter. Number of elements it will contain.error_rate
: rate of error you're willing to assume. Default is 0.0005.backend
:numpy
,bitarray
orredis
. Default is numpy.- Only applies with
redis
backend:redis_connection
: url for redis connection as accepted by redis-py.connection_retries
: max number of connection retries in case of losing the connection with redis. Default is 3.wait
: max waiting time before trying to make a new request against redis.prefix_key
: key used in redis to store bloom filter data.
API
add(element)
: add a new element in the filter.full
: property that indicates if the filter is full.false_positive_probability
: property that indicates current and updated error rate of the filter. This value should match with choosed error_rate when BloomFilterPy was instanciated, but as new items are added, this value will change.
Example
from pybloom.src.bloomfilter import BloomFilter
if __name__ == '__main__':
f = BloomFilter(10, error_rate=0.0000003, backend='bitarray')
for i in range(10):
f.add(i) # or f += i
assert i in f
print(f.false_positive_probability, 11 in f) # 6.431432780588261e-07 False
In the example above, we have created a bloom filter using bitarray
backend, with 10
expected elements and max false probability assumed of 0.0000003
.
How can I extend it?
If you install this library from sources and are interested in build a new backend, like MongoBackend or FileSystemBackend for example, is very simple. You just need extend your new backend from BaseBackend
class and implement the following methods:
_add(*args, **kwargs)
: this method specify the way of adding new elements in the filter using the backend.reset()
: this method is used to delete or purge every element from the filter.
Besides, __init__
method must have two parameters to define the array size and optimal hash. For convention, array_size: int
and optimal_hash: int
are used.
For example, in a hypothetical MongoBackend, the skeleton would be something similar to:
class MongoBackend(BaseBackend):
def __init__(self, array_size: int, optimal_hash: int, **kwargs):
# In kwargs you can put mongodb connection details, like host, port and so on.
self._mongo_connection = MongoClient(**kwargs)
super(MongoBackend, self).__init__(array_size, hash_size)
def _add(self, other):
# perform hashing functions of other and save it in mongo using mongo_connection
def reset(self):
# purge bloom filter using mongo_connection
Once you have a new backend ready, you must add it into BloomFilter factory class:
class BloomFilter(object):
def __new__(cls, max_number_of_element_expected: int, error_rate=.0005, backend='numpy', **kwargs):
...
elif backend == 'MongoBackend':
return MongoBackend(filter_metadata.optimal_size, filter_metadata.optimal_hash, **kwargs)
...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for BloomFilterPy-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a76cb7b187fdaf897ffbca5042b58870ebb1cc87524b007a93e04341f94d082d |
|
MD5 | 7e4618ac1da279ce77572346a09108e5 |
|
BLAKE2b-256 | d8324688211c48e92147ee6e0abb31d0594305270213104a6839a5fa9791eea6 |