Skip to main content

An implementation of the OkBloomer algorithm, an autoscaling Bloom filter with ultra-low memory footprint for Python.

Project description

Ok Bloomer

An implementation of the OkBloomer algorithm, an autoscaling Bloom filter with ultra-low memory footprint for Python. Ok Bloomer employs a novel layered filtering strategy that allows it to expand while maintaining an upper bound on the false positive rate. As such, Ok Bloomer is suitable for streaming data where the size is not known a priori.

  • Ultra-low memory footprint
  • Autoscaling works on streaming data
  • Bounded maximum false positive rate
  • Open-source and free to use commercially

Installation

Install DNA Hash using a Python package manager, example pip:

pip install okbloomer

Parameters

# Name Default Type Description
1 max_false_positive_rate 0.01 float The upper bound on the false positivity rate.
2 num_hashes 4 int The number of hash functions used, i.e. the number of slices per layer.
3 layer_size 32000000 int The size of each layer of the filter in bits. Ideal sizes can be divided evenly by num_hashes.

Example Usage

from okbloomer import BloomFilter

filter = BloomFilter(
    max_false_positive_rate=0.01,
    num_hashes=4,
    layer_size=32000000,
)

filter.insert('foo')

print(filter.exists('foo'))

print(filter.existsOrInsert('bar'))

print(filter.exists('bar'))

print(filter.false_positive_rate())
True 

False

True

3.906249999999999e-27

References

  • [1] A. DalPino. (2021). OkBloomer, a novel autoscaling Bloom Filter [link].
  • [2] K. Christensen, et al. A New Analysis of the False-Positive Rate of a Bloom Filter.
  • [3] A. Kirsch, et al. Less Hashing, Same Performance: Building a Better Bloom Filter, 2006.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okbloomer-0.0.4.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

okbloomer-0.0.4-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file okbloomer-0.0.4.tar.gz.

File metadata

  • Download URL: okbloomer-0.0.4.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for okbloomer-0.0.4.tar.gz
Algorithm Hash digest
SHA256 74983b51b919113b2df0cc842bd453deff9e39f292fa5bf60a18188585be82b0
MD5 c9a058c20809a5b7068b9ee7e15fb908
BLAKE2b-256 32d9373d66a8c03dea1efabf4c982a2f4337051012e63a78d8231a6d7ace4033

See more details on using hashes here.

File details

Details for the file okbloomer-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: okbloomer-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for okbloomer-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a7d5bce101877d834477d3412ec953359a1ba03f5fccb0b095136a35c46e12aa
MD5 1237286529d9d10be33e60735bdbdd5f
BLAKE2b-256 8880fd076e3a606d97525e53747cf415b2332842552caec8ab4601e2ba55a256

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page