Skip to main content

A Python library implementing various Bloom filter types

Project description

Profusion

Profusion is a Python library implementing various Bloom filter types: standard, counting, scalable.

Bloom filters are probabilistic data structures for efficient storage and querying of large datasets, trading accuracy for space. They quickly determine if an element is definitely not in a set - useful for caching, spam filtering, and network routing. Bloom filters save space compared to traditional structures but can't definitively prove set membership, delete elements, or return stored items.

Installation

pip install profusion

Usage

Here are examples of how to use the different Bloom filter implementations:

Standard Bloom Filter

from profusion import Bloom

# Create a new Bloom filter
bf = Bloom(capacity=1000000, error_ratio=1e-5)

# Add elements
bf.add("apple")
bf.add("banana")
bf.add("carrot")

# Check if elements are in the filter
print("apple" in bf)  # True
print("donut" in bf)  # False

# Save the filter to a file
bf.save("bloom_filter.gz")

# Load the filter from a file
bf_loaded = Bloom(path="bloom_filter.gz")

# Check if elements are in the loaded filter
print("banana" in bf_loaded)  # True
print("elderberry" in bf_loaded)  # False

Counting Bloom Filter

from profusion import CountingBloom

# Create a new Counting Bloom filter
cbf = CountingBloom(capacity=1000000, error_ratio=1e-5, bin_size=255)

# Add elements with different counts
cbf.add("apple", amount=3)
cbf.add("banana", amount=2)
cbf.add("carrot", amount=1)

# Check the count of elements
print(cbf.value("apple"))  # 3
print(cbf.value("banana"))  # 2
print(cbf.value("carrot"))  # 1
print(cbf.value("donut"))  # 0

# Check if elements meet a certain threshold
print(cbf.check("apple", trigger=2))  # True
print(cbf.check("banana", trigger=3))  # False

# Add more to an existing element
cbf.add("banana", amount=2)
print(cbf.value("banana"))  # 4

Scalable Bloom Filter

from profusion import ScalableBloom

# Create a new Scalable Bloom filter
sbf = ScalableBloom(max_error=1e-5, initial_size=1024, growth_factor=2)

# Add a large number of elements
for i in range(10000):
    sbf.add(f"element_{i}")

# Check if elements are in the filter
print(sbf.check("element_42"))  # True
print(sbf.check("nonexistent"))  # False

# Demonstrate the scalability
print(f"Number of internal filters: {sbf.blooms}")
print(f"Total capacity: {sbf.threshold}")

# Use check_then_add method
print(sbf.check_then_add("new_element"))  # False (element was not present, but is now added)
print(sbf.check_then_add("new_element"))  # True (element is already present)

Memory-mapped Counting Bloom Filter

from profusion import MMCountingBloom

# Create a new Memory-mapped Counting Bloom filter
mmcbf = MMCountingBloom("my_filter", capacity=1000000, error_ratio=1e-5)

# Add elements
mmcbf.add("apple")
mmcbf.add("banana", amount=2)

# Check the value of elements
print(mmcbf.value("apple"))  # 1
print(mmcbf.value("banana"))  # 2

# Check if elements meet a certain threshold
print(mmcbf.check("apple", trigger=1))  # True
print(mmcbf.check("banana", trigger=3))  # False

# The filter persists across different instances
del mmcbf

# Create a new instance with the same name
mmcbf_2 = MMCountingBloom("my_filter")

# The previously added elements are still present
print(mmcbf_2.value("apple"))  # 1
print(mmcbf_2.value("banana"))  # 2

# Clean up (remove the memory-mapped file)
import os
os.remove(mmcbf_2.path)

License

This project is licensed under the CC0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profusion-0.1.2.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

profusion-0.1.2-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file profusion-0.1.2.tar.gz.

File metadata

  • Download URL: profusion-0.1.2.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.2.tar.gz
Algorithm Hash digest
SHA256 069525fe9357eab6d3ce585910c9ef2b6e31597bc7738bb18a3bb0e87f89ae52
MD5 b4c86f9539ef3fec6df9c9e294ef9979
BLAKE2b-256 f90ff90420aa49a116f8b6897ab9a008a54bbb8bcf87e3e665be37150be7bcb5

See more details on using hashes here.

File details

Details for the file profusion-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: profusion-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0f8c95495980881a6199667cd5ed8647306f1f68ca146fc02b0cd25eabb76c35
MD5 d9c0063b7b86351b129dada668226fc3
BLAKE2b-256 428bc9649bce8ea4fef3927d0bd8249e8e1d6f3e29ae9cfde4e4448b403b5b70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page