Fast, memory-efficient Bloom Filters backed by Redis / Valkey bitmaps.
Project description
ValBloom
Fast, memory-efficient Bloom Filters backed by Redis / Valkey bitmaps.
ValBloom provides three async-first probabilistic data structures for membership testing — ideal for token blacklisting, duplicate detection, rate limiting, and more.
Installation
pip install valbloom
Requires Python ≥ 3.9 and a running Redis or Valkey instance.
Quick Start
from redis.asyncio import Redis
from valbloom import BloomFilter
r = Redis.from_url("redis://localhost")
# Create a filter expecting up to 100k items with 0.1% false-positive rate
bf = BloomFilter(r, "bf:emails", capacity=100_000, error_rate=0.001)
await bf.add("alice@example.com")
await bf.exists("alice@example.com") # True
await bf.exists("bob@example.com") # False (probably)
Filter Types
BloomFilter — Standard
The classic space-efficient probabilistic set. Supports add and membership testing but not deletion.
bf = BloomFilter(client, "bf:tokens", capacity=100_000)
await bf.add("token-abc")
await bf.add_many(["token-1", "token-2", "token-3"])
await bf.exists("token-abc") # True
await bf.exists_many(["token-1", "nope"]) # {"token-1": True, "nope": False}
# Set operations (requires compatible filters — same capacity & error_rate)
merged = await bf1.union(bf2, dest_key="bf:merged")
common = await bf1.intersection(bf2, dest_key="bf:common")
CountingBloomFilter — Supports Deletion
Uses counter buckets (Redis hash) instead of single bits, allowing items to be removed.
from valbloom import CountingBloomFilter
cbf = CountingBloomFilter(client, "cbf:sessions", capacity=50_000)
await cbf.add("session-xyz")
await cbf.exists("session-xyz") # True
await cbf.remove("session-xyz")
await cbf.exists("session-xyz") # False
[!WARNING] Duplicate-add behaviour: A
CountingBloomFilterdoes not de-duplicate. Eachadd()increments the internal counters independently, so adding the same item N times requires N matchingremove()calls beforeexists()returnsFalse.await cbf.add("session-xyz") await cbf.add("session-xyz") # counter = 2 await cbf.remove("session-xyz") # counter = 1 await cbf.exists("session-xyz") # True — one add still outstandingA
WARNING-level log is emitted when a probable duplicate add is detected. If you need strict single-add semantics, guard withexists()first.
ScalableBloomFilter — Auto-Growing
Automatically creates new filter slices when the current one fills up, so the false-positive rate stays bounded even as data grows beyond the initial capacity.
from valbloom import ScalableBloomFilter
sbf = ScalableBloomFilter(
client, "sbf:users",
initial_capacity=1_000,
growth_factor=2, # each new slice doubles in capacity
ratio=0.9, # each slice tightens the error rate
)
# Add millions of items — ValBloom handles the scaling
for i in range(50_000):
await sbf.add(f"user-{i}")
[!NOTE] TTL and auto-scaling: When
set_ttl()is called, the TTL is persisted in the filter's metadata. New slices created by auto-scaling automatically inherit the remaining TTL so all keys expire at roughly the same wall-clock time. You do not need to callset_ttl()again after scaling occurs.
Common API
All three filter types share these methods:
| Method | Description |
|---|---|
add(item) |
Add a single item |
exists(item) → bool |
Check membership (probabilistic) |
add_many(items) |
Bulk add |
exists_many(items) → dict |
Bulk check |
count → int |
Number of items added |
info() → dict |
Filter stats (size, fill ratio, etc.) |
clear() |
Reset the filter |
set_ttl(seconds) |
Set expiration on Redis keys |
get_ttl() → int |
Get remaining TTL |
CountingBloomFilter additionally supports:
| Method | Description |
|---|---|
remove(item) |
Remove a single item |
remove_many(items) |
Bulk remove |
BloomFilter additionally supports:
| Method | Description |
|---|---|
union(other, dest_key) |
Merge filters with BITOP OR |
intersection(other, dest_key) |
Intersect filters with BITOP AND |
[!IMPORTANT]
union()andintersection()require both filters to have identicalcapacityanderror_rate. Attempting to combine incompatible filters raisesIncompatibleFilterErrorwith a descriptive message showing the mismatched parameters.
How It Works
A Bloom filter uses a bit array of m bits and k independent hash functions. When adding an item, all k hash positions are set to 1. To check membership, all k positions are tested — if any is 0, the item is definitely not in the set.
ValBloom automatically computes optimal m and k from your desired capacity and error_rate:
- Bit-array size:
m = -(n × ln p) / (ln 2)² - Hash count:
k = (m / n) × ln 2
| Capacity | Error Rate | Memory | Hashes |
|---|---|---|---|
| 10,000 | 1% | ~12 KB | 7 |
| 100,000 | 0.1% | ~180 KB | 10 |
| 1,000,000 | 0.01% | ~2.3 MB | 13 |
When to Use Which
| Use case | Filter |
|---|---|
| Token blacklisting, email dedup, IP blocklisting | BloomFilter |
| Session tracking, temporary bans (need removal) | CountingBloomFilter |
| Unbounded growth, don't know final size | ScalableBloomFilter |
⚠️ Production Warnings
Redis Persistence
ValBloom stores all filter state in Redis. If Redis restarts without persistence configured, the entire bloom filter disappears silently. For production systems — especially healthcare, finance, or security workloads — this means previously blocked lookups will suddenly miss, requiring all items to be re-added.
Ensure Redis persistence (RDB or AOF) is configured:
# redis.conf — enable AOF for durability
appendonly yes
appendfsync everysec
# Or use RDB snapshots
save 900 1
save 300 10
save 60 10000
[!CAUTION] Without persistence, a Redis restart causes total filter loss. In a healthcare system this means previously de-duplicated lookups will hit your primary database again, potentially causing duplicate processing or degraded performance.
Key Expiration & TTL
When using set_ttl(), both the bitmap (or hash) and the counter key receive the same TTL. For ScalableBloomFilter, newly created slices automatically inherit the remaining TTL from the parent metadata key.
If you need filters to persist indefinitely, do not call set_ttl() — keys default to no expiration.
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests (uses fakeredis, no real Redis needed)
pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valbloom-1.0.1.tar.gz.
File metadata
- Download URL: valbloom-1.0.1.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
185606facfb35a6c8b533f0183a3f874a4900b2997b3c9fbfe4aa439d3d6573b
|
|
| MD5 |
1e4fed6c55153ae7ee1e55c5ee2ef7d0
|
|
| BLAKE2b-256 |
cdd8e1a1ed309be09d38a8df91d5a187e4f9a15b624c9a677b2041ad1fb4267b
|
File details
Details for the file valbloom-1.0.1-py3-none-any.whl.
File metadata
- Download URL: valbloom-1.0.1-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8f93c960a30baf7713e07219541f8744ffe0185b5df02f39397dc0895a3d94b
|
|
| MD5 |
ae5a19751a877b2435ef5bb137b0273e
|
|
| BLAKE2b-256 |
062ffe3f13a0c1b546d491667bc8356a807b8bd1e84335cc92e0b5dc6d5fac18
|