Some fast bloom filter implemented by Rust for Python and Rust! 10x faster than pybloom!
Project description
fastbloom
A fast bloom filter | counting bloom filter implemented by Rust for Rust and Python!
Language: 简体中文
setup
Python
requirements
Python >= 3.7
install
Install the latest fastbloom version with:
pip install fastbloom-rs
Rust
fastbloom-rs = "{latest}"
Java
maven
<dependency>
<groupId>io.github.yankun1992</groupId>
<artifactId>fastbloom</artifactId>
<version>{latest-version}</version>
</dependency>
Examples
BloomFilter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.
Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. Full text article
Python
basic usage
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
bloom.add_bytes(b'world')
bloom.add_int(9527)
assert bloom.contains('hello')
assert bloom.contains(b'world')
assert bloom.contains(9527)
assert not bloom.contains('hello world')
build bloom filter from bytes or list
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
assert bloom.contains('hello')
bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
assert bloom2.contains('hello')
bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
assert bloom3.contains('hello')
there are some bulk api for python to reduce ffi cost between python and rust
bloom = BloomFilter(100_000_000, 0.01)
inserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]
checks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]
results = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]
bloom.add_int_batch(inserts)
contains = bloom.contains_int_batch(checks)
assert contains == results
bloom.add_str_batch(list(map(lambda x: str(x), inserts)))
assert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results
bloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))
assert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results
more examples at py_tests.
Rust
use fastbloom_rs::{BloomFilter, FilterBuilder};
let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();
bloom.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
assert_eq!(bloom.contains(b"helloworld!"), false);
more examples at docs.rs
CountingBloomFilter
A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.
Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006
Python
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01)
cbf.add('hello')
cbf.add('hello')
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' in cbf # because 'hello' added twice.
# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
cbf.remove('hello')
assert 'hello' not in cbf
A CountingBloomFilter has a four bits counter to save hash index, so when insert an
element repeatedly, the counter will spill over quickly. So, you can set
enable_repeat_insert
to False
to check whether the element has added.
if it has added, it will not add again. enable_repeat_insert
default set to True
.
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01, False)
cbf.add('hello')
cbf.add('hello') # because enable_repeat_insert=False, this addition will not take effect.
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' not in cbf
more examples at py_tests.
Rust
use fastbloom_rs::{CountingBloomFilter, FilterBuilder};
let mut builder = FilterBuilder::new(100_000, 0.01);
let mut cbf = builder.build_counting_bloom_filter();
cbf.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
benchmark
computer info
CPU | Memory | OS |
---|---|---|
AMD Ryzen 7 5800U with Radeon Graphics | 16G | Windows 10 |
add one str to bloom filter
Benchmark insert one str to bloom filter:
bloom_add_test time: [41.168 ns 41.199 ns 41.233 ns]
change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) high mild
12 (12.00%) high severe
add one million to bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to bloom filter:
bloom_add_all_test time: [236.24 ms 236.86 ms 237.55 ms]
change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
check one contains in bloom filter
bloom_contains_test time: [42.065 ns 42.102 ns 42.156 ns]
change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe
check one not contains in bloom filter
bloom_not_contains_test time: [22.695 ns 22.727 ns 22.773 ns]
change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
add one str to counting bloom filter
counting_bloom_add_test time: [60.822 ns 60.861 ns 60.912 ns]
change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
add one million to counting bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to counting bloom filter:
counting_bloom_add_million_test
time: [272.48 ms 272.58 ms 272.68 ms]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-win_amd64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 177.5 kB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 677e926cfa216baf6bd9da296d0cb9b438febeedd3fc3cbf358a06f41fb38fef |
|
MD5 | c96eb01e28aa62a04c93e4110a5e5e4d |
|
BLAKE2b-256 | c3ff75a48cc21620952f7b4cb29d039d526bafdd4a8e7b22d4821c4401149022 |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 282.1 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5634cdd378eedd308e5df33444ea9464e0f44947d700324014f4dd5b13d0f296 |
|
MD5 | b57dd7ab962702de2e29ed891a30bfbf |
|
BLAKE2b-256 | ff86544dda7e064cf55c49423a0a84044435930d0e5b650ca55d6ba17d8d01ca |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
- Upload date:
- Size: 379.7 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ppc64le
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92f56cce343195ad54e5b386475e8f5745810f8cea7c799091e6ad2fd94706c2 |
|
MD5 | 6e5fcbde6caa9fd140b8a251dd6b093e |
|
BLAKE2b-256 | 92266ccc3ed9b37d0b02a478d226a666e6a5639082015aaea35c8fcce4e4bffa |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
- Upload date:
- Size: 399.4 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ppc64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c45a2be931a7339a20303d4a1a83d9b6885d181710a87e8192aa2cf16743b7b |
|
MD5 | e0a1b384a6dde3520224e6a0c124f2ab |
|
BLAKE2b-256 | ae4d6b2c4586067aec9d71f28a9fa7b49e43b9ee7930305778e84de5f0ced1a9 |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 269.1 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e935f03cb4662ec3d1291958ddd9e69060505a1b6cc7ad48db27b43acff6da07 |
|
MD5 | 084b3158816afe1bcbd4616079530ebc |
|
BLAKE2b-256 | 06ad0085fe8a7fe6297c6f17860822046b7575f004236637a138b2bcaa74548f |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 270.3 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4264ba929141be9796bb3d2b0366e2e969be60395b483ac536d44cdbdf08024f |
|
MD5 | 4b47d0f674dd6de6ec8e2303b2c43118 |
|
BLAKE2b-256 | 190badb5913c8ce603c676ad9ef523450b55f3187d42692f41447a8ec75842b2 |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 238.2 kB
- Tags: CPython 3.7+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 540182b4202cab79d1c62f682a865ee4b5c93b2570807f6bdeaa08a9961691cc |
|
MD5 | 1809e1da5f9093f2001ccf4e884f2211 |
|
BLAKE2b-256 | 97b364d280e228913a3b126411243b8d2a792ae4bb32a3d6e21d24a0f180a05a |
File details
Details for the file fastbloom_rs-0.5.9-cp37-abi3-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: fastbloom_rs-0.5.9-cp37-abi3-macosx_10_7_x86_64.whl
- Upload date:
- Size: 243.0 kB
- Tags: CPython 3.7+, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c3b3f8a94cac8f02b0e12c9806124d9acbcf55155bf4bbad8ca6be107186b7d |
|
MD5 | c6cd771ecd09c414c5b02f88790b99fb |
|
BLAKE2b-256 | 8d852ea6e1a7c0bd79939ad2a22b69af3184482118285adb8f32f05ce87bb754 |