Skip to main content

Some fast bloom filter implemented by Rust for Python and Rust! 10x faster than pybloom!

Project description

fastbloom

OSCS Status docs.rs Test Rust Test Python Benchmark Crates Latest Release PyPI Latest Release Sonatype Nexus (Snapshots)

A fast bloom filter | counting bloom filter implemented by Rust for Rust and Python!

Language: 简体中文

setup

Python

requirements

Python >= 3.7

install

Install the latest fastbloom version with:

pip install fastbloom-rs

Rust

fastbloom-rs = "{latest}"

Java

maven

<dependency>
    <groupId>io.github.yankun1992</groupId>
    <artifactId>fastbloom</artifactId>
    <version>{latest-version}</version>
</dependency>

Examples

BloomFilter

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.

Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. Full text article

Python

basic usage

from fastbloom_rs import BloomFilter

bloom = BloomFilter(100_000_000, 0.01)

bloom.add_str('hello')
bloom.add_bytes(b'world')
bloom.add_int(9527)

assert bloom.contains('hello')
assert bloom.contains(b'world')
assert bloom.contains(9527)

assert not bloom.contains('hello world')

build bloom filter from bytes or list

from fastbloom_rs import BloomFilter

bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
assert bloom.contains('hello')

bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
assert bloom2.contains('hello')

bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
assert bloom3.contains('hello')

there are some bulk api for python to reduce ffi cost between python and rust

bloom = BloomFilter(100_000_000, 0.01)
inserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]
checks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]
results = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]

bloom.add_int_batch(inserts)
contains = bloom.contains_int_batch(checks)
assert contains == results

bloom.add_str_batch(list(map(lambda x: str(x), inserts)))
assert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results

bloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))
assert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results

more examples at py_tests.

Rust

use fastbloom_rs::{BloomFilter, FilterBuilder};

let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();

bloom.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
assert_eq!(bloom.contains(b"helloworld!"), false);

more examples at docs.rs

CountingBloomFilter

A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.

Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006

Python

from fastbloom_rs import CountingBloomFilter

cbf = CountingBloomFilter(1000_000, 0.01)
cbf.add('hello')
cbf.add('hello')
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' in cbf  # because 'hello' added twice. 
# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
cbf.remove('hello')
assert 'hello' not in cbf

A CountingBloomFilter has a four bits counter to save hash index, so when insert an element repeatedly, the counter will spill over quickly. So, you can set enable_repeat_insert to False to check whether the element has added. if it has added, it will not add again. enable_repeat_insert default set to True.

from fastbloom_rs import CountingBloomFilter

cbf = CountingBloomFilter(1000_000, 0.01, False)
cbf.add('hello')
cbf.add('hello')  # because enable_repeat_insert=False, this addition will not take effect. 
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' not in cbf 

more examples at py_tests.

Rust

use fastbloom_rs::{CountingBloomFilter, FilterBuilder};

let mut builder = FilterBuilder::new(100_000, 0.01);
let mut cbf = builder.build_counting_bloom_filter();
cbf.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);

benchmark

computer info

CPU Memory OS
AMD Ryzen 7 5800U with Radeon Graphics 16G Windows 10

add one str to bloom filter

Benchmark insert one str to bloom filter:

bloom_add_test          time:   [41.168 ns 41.199 ns 41.233 ns]
                        change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

add one million to bloom filter

Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() }) to bloom filter:

bloom_add_all_test      time:   [236.24 ms 236.86 ms 237.55 ms]
                        change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

check one contains in bloom filter

bloom_contains_test     time:   [42.065 ns 42.102 ns 42.156 ns]
                        change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  9 (9.00%) high severe

check one not contains in bloom filter

bloom_not_contains_test time:   [22.695 ns 22.727 ns 22.773 ns]
                        change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

add one str to counting bloom filter

counting_bloom_add_test time:   [60.822 ns 60.861 ns 60.912 ns]
                        change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

add one million to counting bloom filter

Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() }) to counting bloom filter:

counting_bloom_add_million_test
                        time:   [272.48 ms 272.58 ms 272.68 ms]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

fastbloom_rs-0.5.9-cp37-abi3-win_amd64.whl (177.5 kB view details)

Uploaded CPython 3.7+ Windows x86-64

fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.1 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (379.7 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ppc64le

fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl (399.4 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ppc64

fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (269.1 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARMv7l

fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (270.3 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

fastbloom_rs-0.5.9-cp37-abi3-macosx_11_0_arm64.whl (238.2 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

fastbloom_rs-0.5.9-cp37-abi3-macosx_10_7_x86_64.whl (243.0 kB view details)

Uploaded CPython 3.7+ macOS 10.7+ x86-64

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 677e926cfa216baf6bd9da296d0cb9b438febeedd3fc3cbf358a06f41fb38fef
MD5 c96eb01e28aa62a04c93e4110a5e5e4d
BLAKE2b-256 c3ff75a48cc21620952f7b4cb29d039d526bafdd4a8e7b22d4821c4401149022

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5634cdd378eedd308e5df33444ea9464e0f44947d700324014f4dd5b13d0f296
MD5 b57dd7ab962702de2e29ed891a30bfbf
BLAKE2b-256 ff86544dda7e064cf55c49423a0a84044435930d0e5b650ca55d6ba17d8d01ca

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 92f56cce343195ad54e5b386475e8f5745810f8cea7c799091e6ad2fd94706c2
MD5 6e5fcbde6caa9fd140b8a251dd6b093e
BLAKE2b-256 92266ccc3ed9b37d0b02a478d226a666e6a5639082015aaea35c8fcce4e4bffa

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
Algorithm Hash digest
SHA256 3c45a2be931a7339a20303d4a1a83d9b6885d181710a87e8192aa2cf16743b7b
MD5 e0a1b384a6dde3520224e6a0c124f2ab
BLAKE2b-256 ae4d6b2c4586067aec9d71f28a9fa7b49e43b9ee7930305778e84de5f0ced1a9

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 e935f03cb4662ec3d1291958ddd9e69060505a1b6cc7ad48db27b43acff6da07
MD5 084b3158816afe1bcbd4616079530ebc
BLAKE2b-256 06ad0085fe8a7fe6297c6f17860822046b7575f004236637a138b2bcaa74548f

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4264ba929141be9796bb3d2b0366e2e969be60395b483ac536d44cdbdf08024f
MD5 4b47d0f674dd6de6ec8e2303b2c43118
BLAKE2b-256 190badb5913c8ce603c676ad9ef523450b55f3187d42692f41447a8ec75842b2

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 540182b4202cab79d1c62f682a865ee4b5c93b2570807f6bdeaa08a9961691cc
MD5 1809e1da5f9093f2001ccf4e884f2211
BLAKE2b-256 97b364d280e228913a3b126411243b8d2a792ae4bb32a3d6e21d24a0f180a05a

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.9-cp37-abi3-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.9-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 9c3b3f8a94cac8f02b0e12c9806124d9acbcf55155bf4bbad8ca6be107186b7d
MD5 c6cd771ecd09c414c5b02f88790b99fb
BLAKE2b-256 8d852ea6e1a7c0bd79939ad2a22b69af3184482118285adb8f32f05ce87bb754

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page