Skip to main content

Scalable bloom filter using different backends written in Python

Project description

python-bloomfilter

Scalable bloom filter using different backends written in Python. Current version only works with Python 3.

Installation

pip install BloomFilterPy

Backends

Currently, BloomFilterPy has the following backends available: numpy, bitarray and redis. The first two are recommended when the expected number of elements in the filter fit in memory. Redis backend is the preferred when:

  • Expect huge amount of data in the filter that it doesn't fit in memory.
  • You want a distributed filter available (i.e. more than one machine).

Usage & API

BloomFilterPy implements a common API regardless of the backend used. Every backend extends BaseBackend class that implements the common API. In turn, this base class extends the default set class of Python, but just add operation is properly handled.

BloomFilter class

  • max_number_of_element_expected: Size of filter. Number of elements it will contain.
  • error_rate: rate of error you're willing to assume. Default is 0.0005.
  • backend: numpy, bitarray or redis. Default is numpy.
  • Only applies with redis backend:
    • redis_connection: url for redis connection as accepted by redis-py.
    • connection_retries: max number of connection retries in case of losing the connection with redis. Default is 3.
    • wait: max waiting time before trying to make a new request against redis.
    • prefix_key: key used in redis to store bloom filter data.

API

  • add(element): add a new element in the filter.
  • full: property that indicates if the filter is full.
  • false_positive_probability: property that indicates current and updated error rate of the filter. This value should match with choosed error_rate when BloomFilterPy was instanciated, but as new items are added, this value will change.

Example

from pybloom.src.bloomfilter import BloomFilter

if __name__ == '__main__':
    f = BloomFilter(10, error_rate=0.0000003, backend='bitarray')

    for i in range(10):
        f.add(i)  # or f += i
        assert i in f

    print(f.false_positive_probability, 11 in f) # 6.431432780588261e-07 False

In the example above, we have created a bloom filter using bitarray backend, with 10 expected elements and max false probability assumed of 0.0000003.

How can I extend it?

If you install this library from sources and are interested in build a new backend, like MongoBackend or FileSystemBackend for example, is very simple. You just need extend your new backend from BaseBackend class and implement the following methods:

  • _add(*args, **kwargs): this method specify the way of adding new elements in the filter using the backend.
  • reset(): this method is used to delete or purge every element from the filter.

Besides, __init__ method must have two parameters to define the array size and optimal hash. For convention, array_size: int and optimal_hash: int are used.

For example, in a hypothetical MongoBackend, the skeleton would be something similar to:

class MongoBackend(BaseBackend):
  def __init__(self, array_size: int, optimal_hash: int, **kwargs):
    # In kwargs you can put mongodb connection details, like host, port and so on.

    self._mongo_connection = MongoClient(**kwargs)
    super(MongoBackend, self).__init__(array_size, hash_size)

  def _add(self, other):
    # perform hashing functions of other and save it in mongo using mongo_connection

  def reset(self):
    # purge bloom filter using mongo_connection

Once you have a new backend ready, you must add it into BloomFilter factory class:

class BloomFilter(object):
    def __new__(cls, max_number_of_element_expected: int, error_rate=.0005, backend='numpy', **kwargs):
      ...

      elif backend == 'MongoBackend':
        return MongoBackend(filter_metadata.optimal_size, filter_metadata.optimal_hash, **kwargs)

      ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BloomFilterPy-1.0.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

BloomFilterPy-1.0.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file BloomFilterPy-1.0.2.tar.gz.

File metadata

  • Download URL: BloomFilterPy-1.0.2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for BloomFilterPy-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ac03e255376268189776877bf003e1d181a7ea02eec2dd8760e069e962738a08
MD5 7bac7a5e8f8edc3395b50074fe56715d
BLAKE2b-256 2324cc53acfbd2d51530b1cd3b5b325acbc6b73f5b1958b39e89e7050869e3fe

See more details on using hashes here.

File details

Details for the file BloomFilterPy-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: BloomFilterPy-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for BloomFilterPy-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a76cb7b187fdaf897ffbca5042b58870ebb1cc87524b007a93e04341f94d082d
MD5 7e4618ac1da279ce77572346a09108e5
BLAKE2b-256 d8324688211c48e92147ee6e0abb31d0594305270213104a6839a5fa9791eea6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page