Skip to main content

Probabilistic data structures: Bloom filters, time-decay filters, and more

Project description

probabilistic-rs

Crates.io PyPI Documentation codecov

Probabilistic data structures in Rust with Python bindings and HTTP API.

Features

  • Bloom Filter — fast membership testing with bulk ops and optional Fjall persistence
  • Expiring Bloom Filter — auto-expires elements via sliding time windows
  • HTTP API — REST server with Swagger UI, managing multiple named filters
  • Python Bindings — native wheels via PyO3/maturin
  • CLI + TUI — interactive terminal interface

TUI Screenshot

Installation

Rust:

cargo add probabilistic-rs

Python:

pip install probabilistic-rs

Quick Start

Rust — Bloom Filter

use probabilistic_rs::bloom::{BloomFilter, BloomFilterConfigBuilder, BloomFilterOps};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = BloomFilterConfigBuilder::default()
        .capacity(10000)
        .false_positive_rate(0.01)
        .build()?;

    let filter = BloomFilter::create(config).await?;
    filter.insert(b"item1")?;
    assert!(filter.contains(b"item1")?);
    Ok(())
}

Rust — Expiring Bloom Filter

use probabilistic_rs::ebloom::{
    ExpiringBloomFilter, ExpiringFilterConfigBuilder, ExpiringBloomFilterOps,
};
use std::time::Duration;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ExpiringFilterConfigBuilder::default()
        .capacity(1000)
        .false_positive_rate(0.01)
        .level_duration(Duration::from_secs(60))
        .max_levels(3)
        .build()?;

    let mut filter = ExpiringBloomFilter::new(config)?;
    filter.insert(b"test_item")?;
    assert!(filter.query(b"test_item")?);
    Ok(())
}

Python

from probabilistic_rs import BloomFilter, ExpiringBloomFilter

bf = BloomFilter(capacity=10000, false_positive_rate=0.01)
bf.insert(b"item1")
assert bf.contains(b"item1")

ebf = ExpiringBloomFilter(capacity=1000, false_positive_rate=0.01, ttl_seconds=60)
ebf.insert(b"temp_item")
assert ebf.query(b"temp_item")

HTTP API

# Start server (default: localhost:3000)
probabilistic-server

# Create a filter, insert, query
curl -X POST http://localhost:3000/api/v1/bloom/create \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","capacity":10000,"false_positive_rate":0.01}'

curl -X POST http://localhost:3000/api/v1/bloom/insert \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","item":"hello"}'

curl -X POST http://localhost:3000/api/v1/bloom/contains \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","item":"hello"}'

Endpoints: create, delete, insert, contains, bulk_insert, bulk_contains, clear, stats, list — available for both /api/v1/bloom and /api/v1/ebloom. Swagger UI at /swagger-ui.

CLI

# Create filter
expblf create --db-path myfilter.fjall --capacity 10000 --fpr 0.01

# Operations
expblf load --db-path myfilter.fjall insert --element "key"
expblf load --db-path myfilter.fjall check --element "key"

# Interactive TUI
expblf tui --db-path myfilter.fjall

Benchmarks

Measured on Apple M-series via cargo bench (criterion, 100 samples). Times are total for N operations.

Bloom Filter (in-memory)

Operation 1K elements 100K elements 1M elements
Insert 60.2 µs 6.15 ms 64.1 ms
Query 61.3 µs 6.17 ms 63.1 ms

Expiring Bloom Filter (in-memory, 3 levels — 5 levels nearly identical)

Operation 1K elements 100K elements 1M elements
Insert 63.8 µs 6.61 ms 68.0 ms
Query 63.4 µs 6.52 ms 67.0 ms
Bulk insert 59.7 µs 6.16 ms 63.8 ms
Bulk query 62.3 µs 6.41 ms 65.8 ms
Level rotate 224 µs 255 µs

Both filters sustain ~15–17M ops/s (60–65 ns/op) across all dataset sizes. The expiring filter adds ~5–10% overhead over plain bloom due to multi-level bookkeeping. Bulk operations match or slightly outperform single-item ops. Level rotation (TTL expiry) takes ~250 µs regardless of filter size.

Configuration

Parameter Description Default
capacity Max elements 1,000,000
false_positive_rate Desired FPR 0.01
level_duration TTL per level (expiring) 60s
max_levels Filter levels (expiring) 3

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probabilistic_rs-0.6.0.tar.gz (150.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

probabilistic_rs-0.6.0-cp311-cp311-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.11Windows x86-64

probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

probabilistic_rs-0.6.0-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

probabilistic_rs-0.6.0-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file probabilistic_rs-0.6.0.tar.gz.

File metadata

  • Download URL: probabilistic_rs-0.6.0.tar.gz
  • Upload date:
  • Size: 150.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for probabilistic_rs-0.6.0.tar.gz
Algorithm Hash digest
SHA256 e5fbad9acb4530948a39f14a04ec7a9e27e7719d3428a0c78bdd63d684f16e94
MD5 4fa1aab334abec4a5a533d50bfbfa395
BLAKE2b-256 7ee62cb665387720fe93bf018637ed79fadd2dc77108ce947f4ca14e07bf8026

See more details on using hashes here.

File details

Details for the file probabilistic_rs-0.6.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for probabilistic_rs-0.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8c45a6259b0028f454cdd7de3ce8d41cece102550ceee0c89c1f5bb774b16afd
MD5 48668d0fe02a1ff3c6919cd42191f203
BLAKE2b-256 25bb26803e736e4d9a051833ecbb1216f016a4910128d12f73acc44806d6980c

See more details on using hashes here.

File details

Details for the file probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b217a17a7768be2cb4f072ddf6f989dc57f672438cb8bb1bdfc68141fefc050f
MD5 c0a5b2cc4175ef4448d321d00028ab6f
BLAKE2b-256 f2f50a1fe7f8b86334088d284d10aa96556244ae9a9a054bd6f0bf21fd1c7b17

See more details on using hashes here.

File details

Details for the file probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for probabilistic_rs-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7f8be06840eb8474b9d117c6da6118f0a172d55bb6e43fd162779891e1a48a5a
MD5 628153b581d08fea269bc91cac41ba01
BLAKE2b-256 271dddbbc2b4a6d474e4881e05f5d82d7517b9a18a467a5e3f71ecb50b160117

See more details on using hashes here.

File details

Details for the file probabilistic_rs-0.6.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for probabilistic_rs-0.6.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 15abd5b6a41b1965bc14613185503696d2c1168872ae41ca9fb028ea7650485b
MD5 0f0bbd9735cc3d03a6ecb5dfcd06089a
BLAKE2b-256 aee99b02fe70cefdf83e8384e359f88ba384355c4cb682faf88bc37408e48bba

See more details on using hashes here.

File details

Details for the file probabilistic_rs-0.6.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for probabilistic_rs-0.6.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9bc338cc8f4fd7b304ffee4081b19b8f7c9c0b2fe8f541728227849b6d8e47cd
MD5 4375c7ad59775bf3f62e24f947c6658c
BLAKE2b-256 5c1bfd38fc4837c2fdea4ec451e615911e06323186ecfe09b6ccb51aa838b71d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page