Skip to main content

Fast, memory-efficient Bloom Filters backed by Redis / Valkey bitmaps.

Project description

ValBloom

Fast, memory-efficient Bloom Filters backed by Redis / Valkey bitmaps.

ValBloom provides three async-first probabilistic data structures for membership testing — ideal for token blacklisting, duplicate detection, rate limiting, and more.

Installation

pip install valbloom

Requires Python ≥ 3.9 and a running Redis or Valkey instance.

Quick Start

from redis.asyncio import Redis
from valbloom import BloomFilter

r = Redis.from_url("redis://localhost")

# Create a filter expecting up to 100k items with 0.1% false-positive rate
bf = BloomFilter(r, "bf:emails", capacity=100_000, error_rate=0.001)

await bf.add("alice@example.com")
await bf.exists("alice@example.com")   # True
await bf.exists("bob@example.com")     # False (probably)

Filter Types

BloomFilter — Standard

The classic space-efficient probabilistic set. Supports add and membership testing but not deletion.

bf = BloomFilter(client, "bf:tokens", capacity=100_000)

await bf.add("token-abc")
await bf.add_many(["token-1", "token-2", "token-3"])

await bf.exists("token-abc")               # True
await bf.exists_many(["token-1", "nope"])   # {"token-1": True, "nope": False}

# Set operations (requires compatible filters — same capacity & error_rate)
merged = await bf1.union(bf2, dest_key="bf:merged")
common = await bf1.intersection(bf2, dest_key="bf:common")

CountingBloomFilter — Supports Deletion

Uses counter buckets (Redis hash) instead of single bits, allowing items to be removed.

from valbloom import CountingBloomFilter

cbf = CountingBloomFilter(client, "cbf:sessions", capacity=50_000)

await cbf.add("session-xyz")
await cbf.exists("session-xyz")   # True

await cbf.remove("session-xyz")
await cbf.exists("session-xyz")   # False

ScalableBloomFilter — Auto-Growing

Automatically creates new filter slices when the current one fills up, so the false-positive rate stays bounded even as data grows beyond the initial capacity.

from valbloom import ScalableBloomFilter

sbf = ScalableBloomFilter(
    client, "sbf:users",
    initial_capacity=1_000,
    growth_factor=2,  # each new slice doubles in capacity
    ratio=0.9,        # each slice tightens the error rate
)

# Add millions of items — ValBloom handles the scaling
for i in range(50_000):
    await sbf.add(f"user-{i}")

Common API

All three filter types share these methods:

Method Description
add(item) Add a single item
exists(item) → bool Check membership (probabilistic)
add_many(items) Bulk add
exists_many(items) → dict Bulk check
count → int Number of items added
info() → dict Filter stats (size, fill ratio, etc.)
clear() Reset the filter
set_ttl(seconds) Set expiration on Redis keys
get_ttl() → int Get remaining TTL

CountingBloomFilter additionally supports:

Method Description
remove(item) Remove a single item
remove_many(items) Bulk remove

BloomFilter additionally supports:

Method Description
union(other, dest_key) Merge filters with BITOP OR
intersection(other, dest_key) Intersect filters with BITOP AND

How It Works

A Bloom filter uses a bit array of m bits and k independent hash functions. When adding an item, all k hash positions are set to 1. To check membership, all k positions are tested — if any is 0, the item is definitely not in the set.

ValBloom automatically computes optimal m and k from your desired capacity and error_rate:

  • Bit-array size: m = -(n × ln p) / (ln 2)²
  • Hash count: k = (m / n) × ln 2
Capacity Error Rate Memory Hashes
10,000 1% ~12 KB 7
100,000 0.1% ~180 KB 10
1,000,000 0.01% ~2.3 MB 13

When to Use Which

Use case Filter
Token blacklisting, email dedup, IP blocklisting BloomFilter
Session tracking, temporary bans (need removal) CountingBloomFilter
Unbounded growth, don't know final size ScalableBloomFilter

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests (uses fakeredis, no real Redis needed)
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valbloom-0.1.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valbloom-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file valbloom-0.1.0.tar.gz.

File metadata

  • Download URL: valbloom-0.1.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for valbloom-0.1.0.tar.gz
Algorithm Hash digest
SHA256 95d95148a3d0d0cde83790636d5debcba7a1eab12414e1ac135b085c3aa06156
MD5 c92174144e25129bace69ff43a459cf9
BLAKE2b-256 1c182b944b87d805d418d282d125a1d915ad947e95fdb91742c613906e7c33c8

See more details on using hashes here.

File details

Details for the file valbloom-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: valbloom-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for valbloom-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 213594e5271b98cb16d566324d531f28feb66eabe7aa07b9e160715dee3dba09
MD5 0b15dbba82e0b538d731faa839dba5cb
BLAKE2b-256 97eaa89d1eb3e86e1e7ed02108a37c9753362b0a2b340eddabf1532612db54e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page