Skip to main content

A Python library implementing various Bloom filter types

Project description

Profusion

Profusion is a Python library implementing various Bloom filter types: standard, counting, scalable.

Bloom filters are probabilistic data structures for efficient storage and querying of large datasets, trading accuracy for space. They quickly determine if an element is definitely not in a set - useful for caching, spam filtering, and network routing. Bloom filters save space compared to traditional structures but can't definitively prove set membership, delete elements, or return stored items.

Installation

pip install profusion

Usage

Here are examples of how to use the different Bloom filter implementations:

Standard Bloom Filter

from profusion import Bloom

# Create a new Bloom filter
bf = Bloom(capacity=1000000, error_ratio=1e-5)

# Add elements
bf.add("apple")
bf.add("banana")
bf.add("carrot")

# Check if elements are in the filter
print("apple" in bf)  # True
print("donut" in bf)  # False

# Save the filter to a file
bf.save("bloom_filter.gz")

# Load the filter from a file
bf_loaded = Bloom(path="bloom_filter.gz")

# Check if elements are in the loaded filter
print("banana" in bf_loaded)  # True
print("elderberry" in bf_loaded)  # False

Counting Bloom Filter

from profusion import CountingBloom

# Create a new Counting Bloom filter
cbf = CountingBloom(capacity=1000000, error_ratio=1e-5, bin_size=255)

# Add elements with different counts
cbf.add("apple", amount=3)
cbf.add("banana", amount=2)
cbf.add("carrot", amount=1)

# Check the count of elements
print(cbf.value("apple"))  # 3
print(cbf.value("banana"))  # 2
print(cbf.value("carrot"))  # 1
print(cbf.value("donut"))  # 0

# Check if elements meet a certain threshold
print(cbf.check("apple", trigger=2))  # True
print(cbf.check("banana", trigger=3))  # False

# Add more to an existing element
cbf.add("banana", amount=2)
print(cbf.value("banana"))  # 4

Scalable Bloom Filter

from profusion import ScalableBloom

# Create a new Scalable Bloom filter
sbf = ScalableBloom(max_error=1e-5, initial_size=1024, growth_factor=2)

# Add a large number of elements
for i in range(10000):
    sbf.add(f"element_{i}")

# Check if elements are in the filter
print(sbf.check("element_42"))  # True
print(sbf.check("nonexistent"))  # False

# Demonstrate the scalability
print(f"Number of internal filters: {sbf.blooms}")
print(f"Total capacity: {sbf.threshold}")

# Use check_then_add method
print(sbf.check_then_add("new_element"))  # False (element was not present, but is now added)
print(sbf.check_then_add("new_element"))  # True (element is already present)

Memory-mapped Counting Bloom Filter

from profusion import MMCountingBloom

# Create a new Memory-mapped Counting Bloom filter
mmcbf = MMCountingBloom("my_filter", capacity=1000000, error_ratio=1e-5)

# Add elements
mmcbf.add("apple")
mmcbf.add("banana", amount=2)

# Check the value of elements
print(mmcbf.value("apple"))  # 1
print(mmcbf.value("banana"))  # 2

# Check if elements meet a certain threshold
print(mmcbf.check("apple", trigger=1))  # True
print(mmcbf.check("banana", trigger=3))  # False

# The filter persists across different instances
del mmcbf

# Create a new instance with the same name
mmcbf_2 = MMCountingBloom("my_filter")

# The previously added elements are still present
print(mmcbf_2.value("apple"))  # 1
print(mmcbf_2.value("banana"))  # 2

# Clean up (remove the memory-mapped file)
import os
os.remove(mmcbf_2.path)

License

This project is licensed under the CC0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profusion-0.1.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

profusion-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file profusion-0.1.0.tar.gz.

File metadata

  • Download URL: profusion-0.1.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.0.tar.gz
Algorithm Hash digest
SHA256 39a7d1d9685520a6008389840c5e07276cf539282cb15ebf24f1d117be7c8e3a
MD5 6e1210e05b3cd175374ae7cd5def3b91
BLAKE2b-256 88717570913d2da22c36558c5efeb90db6ca2c0714e3d7f85dd9dd891cf353c9

See more details on using hashes here.

File details

Details for the file profusion-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: profusion-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for profusion-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbd770a25af090b61668a614cf1942b15000e81acc22e5f1a0ed2a056b8b87da
MD5 b407a614a7e7e0e7a46e7f9a481d47c7
BLAKE2b-256 f1853af870473fbabfd22c410ae5de0a55ce7480a55faa3a5b2219d95a003c51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page