Probabilistic data structures: Bloom filters, time-decay filters, and more
Project description
probabilistic-rs
Probabilistic data structures in Rust with Python bindings and HTTP API.
Features
- Bloom Filter — fast membership testing with bulk ops and optional Fjall persistence
- Expiring Bloom Filter — auto-expires elements via sliding time windows
- HTTP API — REST server with Swagger UI, managing multiple named filters
- Python Bindings — native wheels via PyO3/maturin
- CLI + TUI — interactive terminal interface
Installation
Rust:
cargo add probabilistic-rs
Python:
pip install probabilistic-rs
Quick Start
Rust — Bloom Filter
use probabilistic_rs::bloom::{BloomFilter, BloomFilterConfigBuilder, BloomFilterOps};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = BloomFilterConfigBuilder::default()
.capacity(10000)
.false_positive_rate(0.01)
.build()?;
let filter = BloomFilter::create(config).await?;
filter.insert(b"item1")?;
assert!(filter.contains(b"item1")?);
Ok(())
}
Rust — Expiring Bloom Filter
use probabilistic_rs::ebloom::{
ExpiringBloomFilter, ExpiringFilterConfigBuilder, ExpiringBloomFilterOps,
};
use std::time::Duration;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = ExpiringFilterConfigBuilder::default()
.capacity(1000)
.false_positive_rate(0.01)
.level_duration(Duration::from_secs(60))
.max_levels(3)
.build()?;
let mut filter = ExpiringBloomFilter::new(config)?;
filter.insert(b"test_item")?;
assert!(filter.query(b"test_item")?);
Ok(())
}
Python
from probabilistic_rs import BloomFilter, ExpiringBloomFilter
bf = BloomFilter(capacity=10000, false_positive_rate=0.01)
bf.insert(b"item1")
assert bf.contains(b"item1")
ebf = ExpiringBloomFilter(capacity=1000, false_positive_rate=0.01, ttl_seconds=60)
ebf.insert(b"temp_item")
assert ebf.query(b"temp_item")
HTTP API
# Start server (default: localhost:3000)
probabilistic-server
# Create a filter, insert, query
curl -X POST http://localhost:3000/api/v1/bloom/create \
-H "Content-Type: application/json" \
-d '{"name":"my-filter","capacity":10000,"false_positive_rate":0.01}'
curl -X POST http://localhost:3000/api/v1/bloom/insert \
-H "Content-Type: application/json" \
-d '{"name":"my-filter","item":"hello"}'
curl -X POST http://localhost:3000/api/v1/bloom/contains \
-H "Content-Type: application/json" \
-d '{"name":"my-filter","item":"hello"}'
Endpoints: create, delete, insert, contains, bulk_insert, bulk_contains, clear, stats, list — available for both /api/v1/bloom and /api/v1/ebloom. Swagger UI at /swagger-ui.
CLI
# Create filter
expblf create --db-path myfilter.fjall --capacity 10000 --fpr 0.01
# Operations
expblf load --db-path myfilter.fjall insert --element "key"
expblf load --db-path myfilter.fjall check --element "key"
# Interactive TUI
expblf tui --db-path myfilter.fjall
Benchmarks
Measured on Apple M-series via cargo bench (criterion, 100 samples). Times are total for N operations.
Bloom Filter (in-memory)
| Operation | 1K elements | 100K elements | 1M elements |
|---|---|---|---|
| Insert | 60.2 µs | 6.15 ms | 64.1 ms |
| Query | 61.3 µs | 6.17 ms | 63.1 ms |
Expiring Bloom Filter (in-memory, 3 levels — 5 levels nearly identical)
| Operation | 1K elements | 100K elements | 1M elements |
|---|---|---|---|
| Insert | 63.8 µs | 6.61 ms | 68.0 ms |
| Query | 63.4 µs | 6.52 ms | 67.0 ms |
| Bulk insert | 59.7 µs | 6.16 ms | 63.8 ms |
| Bulk query | 62.3 µs | 6.41 ms | 65.8 ms |
| Level rotate | 224 µs | 255 µs | — |
Both filters sustain ~15–17M ops/s (60–65 ns/op) across all dataset sizes. The expiring filter adds ~5–10% overhead over plain bloom due to multi-level bookkeeping. Bulk operations match or slightly outperform single-item ops. Level rotation (TTL expiry) takes ~250 µs regardless of filter size.
Configuration
| Parameter | Description | Default |
|---|---|---|
capacity |
Max elements | 1,000,000 |
false_positive_rate |
Desired FPR | 0.01 |
level_duration |
TTL per level (expiring) | 60s |
max_levels |
Filter levels (expiring) | 3 |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file probabilistic_rs-0.6.0.tar.gz.
File metadata
- Download URL: probabilistic_rs-0.6.0.tar.gz
- Upload date:
- Size: 150.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5fbad9acb4530948a39f14a04ec7a9e27e7719d3428a0c78bdd63d684f16e94
|
|
| MD5 |
4fa1aab334abec4a5a533d50bfbfa395
|
|
| BLAKE2b-256 |
7ee62cb665387720fe93bf018637ed79fadd2dc77108ce947f4ca14e07bf8026
|
File details
Details for the file probabilistic_rs-0.6.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: probabilistic_rs-0.6.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c45a6259b0028f454cdd7de3ce8d41cece102550ceee0c89c1f5bb774b16afd
|
|
| MD5 |
48668d0fe02a1ff3c6919cd42191f203
|
|
| BLAKE2b-256 |
25bb26803e736e4d9a051833ecbb1216f016a4910128d12f73acc44806d6980c
|
File details
Details for the file probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b217a17a7768be2cb4f072ddf6f989dc57f672438cb8bb1bdfc68141fefc050f
|
|
| MD5 |
c0a5b2cc4175ef4448d321d00028ab6f
|
|
| BLAKE2b-256 |
f2f50a1fe7f8b86334088d284d10aa96556244ae9a9a054bd6f0bf21fd1c7b17
|
File details
Details for the file probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f8be06840eb8474b9d117c6da6118f0a172d55bb6e43fd162779891e1a48a5a
|
|
| MD5 |
628153b581d08fea269bc91cac41ba01
|
|
| BLAKE2b-256 |
271dddbbc2b4a6d474e4881e05f5d82d7517b9a18a467a5e3f71ecb50b160117
|
File details
Details for the file probabilistic_rs-0.6.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: probabilistic_rs-0.6.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15abd5b6a41b1965bc14613185503696d2c1168872ae41ca9fb028ea7650485b
|
|
| MD5 |
0f0bbd9735cc3d03a6ecb5dfcd06089a
|
|
| BLAKE2b-256 |
aee99b02fe70cefdf83e8384e359f88ba384355c4cb682faf88bc37408e48bba
|
File details
Details for the file probabilistic_rs-0.6.0-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: probabilistic_rs-0.6.0-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bc338cc8f4fd7b304ffee4081b19b8f7c9c0b2fe8f541728227849b6d8e47cd
|
|
| MD5 |
4375c7ad59775bf3f62e24f947c6658c
|
|
| BLAKE2b-256 |
5c1bfd38fc4837c2fdea4ec451e615911e06323186ecfe09b6ccb51aa838b71d
|