Skip to main content

A Python library implementing various Bloom filter types

Project description

Profusion

Profusion is a Python library implementing various Bloom filter types: standard, counting, scalable.

Bloom filters are probabilistic data structures for efficient storage and querying of large datasets, trading accuracy for space. They quickly determine if an element is definitely not in a set - useful for caching, spam filtering, and network routing. Bloom filters save space compared to traditional structures but can't definitively prove set membership, delete elements, or return stored items.

Installation

pip install profusion

Usage

Here are examples of how to use the different Bloom filter implementations:

Standard Bloom Filter

from profusion import Bloom

# Create a new Bloom filter
bf = Bloom(capacity=1000000, error_ratio=1e-5)

# Add elements
bf.add("apple")
bf.add("banana")
bf.add("carrot")

# Check if elements are in the filter
print("apple" in bf)  # True
print("donut" in bf)  # False

# Save the filter to a file
bf.save("bloom_filter.gz")

# Load the filter from a file
bf_loaded = Bloom(path="bloom_filter.gz")

# Check if elements are in the loaded filter
print("banana" in bf_loaded)  # True
print("elderberry" in bf_loaded)  # False

Counting Bloom Filter

from profusion import CountingBloom

# Create a new Counting Bloom filter
cbf = CountingBloom(capacity=1000000, error_ratio=1e-5, bin_size=255)

# Add elements with different counts
cbf.add("apple", amount=3)
cbf.add("banana", amount=2)
cbf.add("carrot", amount=1)

# Check the count of elements
print(cbf.value("apple"))  # 3
print(cbf.value("banana"))  # 2
print(cbf.value("carrot"))  # 1
print(cbf.value("donut"))  # 0

# Check if elements meet a certain threshold
print(cbf.check("apple", trigger=2))  # True
print(cbf.check("banana", trigger=3))  # False

# Add more to an existing element
cbf.add("banana", amount=2)
print(cbf.value("banana"))  # 4

Scalable Bloom Filter

from profusion import ScalableBloom

# Create a new Scalable Bloom filter
sbf = ScalableBloom(max_error=1e-5, initial_size=1024, growth_factor=2)

# Add a large number of elements
for i in range(10000):
    sbf.add(f"element_{i}")

# Check if elements are in the filter
print(sbf.check("element_42"))  # True
print(sbf.check("nonexistent"))  # False

# Demonstrate the scalability
print(f"Number of internal filters: {sbf.blooms}")
print(f"Total capacity: {sbf.threshold}")

# Use check_then_add method
print(sbf.check_then_add("new_element"))  # False (element was not present, but is now added)
print(sbf.check_then_add("new_element"))  # True (element is already present)

Memory-mapped Counting Bloom Filter

from profusion import MMCountingBloom

# Create a new Memory-mapped Counting Bloom filter
mmcbf = MMCountingBloom("my_filter", capacity=1000000, error_ratio=1e-5)

# Add elements
mmcbf.add("apple")
mmcbf.add("banana", amount=2)

# Check the value of elements
print(mmcbf.value("apple"))  # 1
print(mmcbf.value("banana"))  # 2

# Check if elements meet a certain threshold
print(mmcbf.check("apple", trigger=1))  # True
print(mmcbf.check("banana", trigger=3))  # False

# The filter persists across different instances
del mmcbf

# Create a new instance with the same name
mmcbf_2 = MMCountingBloom("my_filter")

# The previously added elements are still present
print(mmcbf_2.value("apple"))  # 1
print(mmcbf_2.value("banana"))  # 2

# Clean up (remove the memory-mapped file)
import os
os.remove(mmcbf_2.path)

License

This project is licensed under the CC0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profusion-0.1.1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

profusion-0.1.1-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file profusion-0.1.1.tar.gz.

File metadata

  • Download URL: profusion-0.1.1.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.1.tar.gz
Algorithm Hash digest
SHA256 56a19f438505d354cd014a30ededd5e8d5f20e0a327d3bdfe385945323301f54
MD5 4cc5c5791d037f32e27169d1a812b02f
BLAKE2b-256 208bf51b07b8af9b93045de0a98afd055e905ec0c1a23875333f74e98560b4f7

See more details on using hashes here.

File details

Details for the file profusion-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: profusion-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 597e3e04a508760f04787695f9b54eb4313bb0ddad5236ce9d5b5596645e1265
MD5 a8fb1cc16b95defbcfdaaa8ac2643221
BLAKE2b-256 d9438033dc3b594c772ad1dbb571b2a238aa545a3c50a2c709d39d4ccafb8cbc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page