Skip to main content

A fast and memory-efficient Bloom Filter implementation with memory mapping support

Project description

Libraries.io SourceRank pypi downloads lint_python Upload Python Package CodeQL GitHub issues GitHub forks GitHub stars GitHub license

Simple and fast pythonic bloomfilter

From wikipedia: "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives."

This filter supports:

- Saving, reloading with pickle. 
- Stats
- Entropy analysis
- Internal and external hashing of data.
- raw filter merging

Installing:

sudo pip install fastbloomfilter

External creation of the bloom filter file:

python mkbloom.py /tmp/filter.blf

Importing:

>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(array_size=1024**3)

Or

>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(filename='/tmp/filter.blf')

Adding data to it:

>>> bf.add('30000')
>>> bf.add('1230213')
>>> bf.add('1')

Printing stats:

>>> bf.stat()

Or:

>>> bf.info()

Querying data:

>>> print(bf.query('1'))
True
>>> print(bf.query('1230213'))
True
>>> print(bf.query('12'))
False
>>> print(bf['1'])
True

Querying data and at the same time adding it:

>>> print(bf.update('1'))
False 
# False means the object wasn't existing and was added.
>>> print(bf.update('1')) 
True  
# True means the object existed and nothing new was added.
>>> print(bf.update('2'))
False
>>> print(bf.update('2'))
True

Merging two filters:

Create first filter:

>>> from fastBloomFilter import bloom
>>> bf1 = bloom.BloomFilter(array_size=1024**3)
>>> bf1.add("1")

Create second filter:

>>> from fastBloomFilter import bloom
>>> bf2 = bloom.BloomFilter(array_size=1024**3)
>>> bf2.add("2")

Merge the two filters into a third filter:

>>> bf3 = bf1 + bf2

Check the elements in the third filter:

>>> print(bf3["1"])
True
>>> print(bf3["2"])
True

Contributing

Contributons:
    Are welcome!
    Criteria: - They should not include hidden folders or files of any ide environment.
              - They should not delete big portions of the project.
              - They should not include files that does not have anything to do with the project.
              - They should not change the API. (API changes should be proposed with Issues as enhancements)
              - They should not include any obfuscated code.
              - They should not include binaries.
              - They should be in small PRs for faster reviewing process.
              - They should include a small testcase.
              - Any contribution not hornoring this criteria will be rejected until it does.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastbloomfilter-0.0.13.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastbloomfilter-0.0.13-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file fastbloomfilter-0.0.13.tar.gz.

File metadata

  • Download URL: fastbloomfilter-0.0.13.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastbloomfilter-0.0.13.tar.gz
Algorithm Hash digest
SHA256 5d548c915ea5e8ce4bbe29445d123aef3562703b46529e4150dbd09bc4b21f05
MD5 d7bf1ce5ab8f1806a410db2b6efb9222
BLAKE2b-256 ed695ec865a3c6b679f139ae14dfc21f18a95bcf7df1a60f5754fcfe88b6800a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastbloomfilter-0.0.13.tar.gz:

Publisher: pypi-publish.yml on daedalus/fastBloomFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastbloomfilter-0.0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for fastbloomfilter-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 943997559608722ad6bce825c6990210368c6c28ccc955c39495919a747c9e96
MD5 b317560742c2b41d75fa2dcb6d13ffdb
BLAKE2b-256 1c8b61d1ffb53d2e8de7229293acfc6a2a1b4c69a1ecf3a762fff0141af00108

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastbloomfilter-0.0.13-py3-none-any.whl:

Publisher: pypi-publish.yml on daedalus/fastBloomFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page