Some fast bloom filter implemented by Rust for Python and Rust! 10x faster than pybloom!
Project description
fastbloom
A fast bloom filter | counting bloom filter implemented by Rust for Rust and Python!
Language: 简体中文
setup
Python
requirements
Python >= 3.7
install
Install the latest fastbloom version with:
pip install fastbloom-rs
Rust
fastbloom-rs = "{latest}"
Java
maven
<dependency>
<groupId>io.github.yankun1992</groupId>
<artifactId>fastbloom</artifactId>
<version>{latest-version}</version>
</dependency>
Examples
BloomFilter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.
Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. Full text article
Python
basic usage
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
bloom.add_bytes(b'world')
bloom.add_int(9527)
assert bloom.contains('hello')
assert bloom.contains(b'world')
assert bloom.contains(9527)
assert not bloom.contains('hello world')
build bloom filter from bytes or list
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
assert bloom.contains('hello')
bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
assert bloom2.contains('hello')
bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
assert bloom3.contains('hello')
there are some bulk api for python to reduce ffi cost between python and rust
bloom = BloomFilter(100_000_000, 0.01)
inserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]
checks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]
results = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]
bloom.add_int_batch(inserts)
contains = bloom.contains_int_batch(checks)
assert contains == results
bloom.add_str_batch(list(map(lambda x: str(x), inserts)))
assert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results
bloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))
assert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results
more examples at py_tests.
Rust
use fastbloom_rs::{BloomFilter, FilterBuilder};
let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();
bloom.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
assert_eq!(bloom.contains(b"helloworld!"), false);
more examples at docs.rs
CountingBloomFilter
A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.
Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006
Python
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01)
cbf.add('hello')
cbf.add('hello')
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' in cbf # because 'hello' added twice.
# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
cbf.remove('hello')
assert 'hello' not in cbf
A CountingBloomFilter has a four bits counter to save hash index, so when insert an
element repeatedly, the counter will spill over quickly. So, you can set
enable_repeat_insert
to False
to check whether the element has added.
if it has added, it will not add again. enable_repeat_insert
default set to True
.
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01, False)
cbf.add('hello')
cbf.add('hello') # because enable_repeat_insert=False, this addition will not take effect.
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' not in cbf
more examples at py_tests.
Rust
use fastbloom_rs::{CountingBloomFilter, FilterBuilder};
let mut builder = FilterBuilder::new(100_000, 0.01);
let mut cbf = builder.build_counting_bloom_filter();
cbf.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
benchmark
computer info
CPU | Memory | OS |
---|---|---|
AMD Ryzen 7 5800U with Radeon Graphics | 16G | Windows 10 |
add one str to bloom filter
Benchmark insert one str to bloom filter:
bloom_add_test time: [41.168 ns 41.199 ns 41.233 ns]
change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) high mild
12 (12.00%) high severe
add one million to bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to bloom filter:
bloom_add_all_test time: [236.24 ms 236.86 ms 237.55 ms]
change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
check one contains in bloom filter
bloom_contains_test time: [42.065 ns 42.102 ns 42.156 ns]
change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe
check one not contains in bloom filter
bloom_not_contains_test time: [22.695 ns 22.727 ns 22.773 ns]
change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
add one str to counting bloom filter
counting_bloom_add_test time: [60.822 ns 60.861 ns 60.912 ns]
change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
add one million to counting bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to counting bloom filter:
counting_bloom_add_million_test
time: [272.48 ms 272.58 ms 272.68 ms]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for fastbloom_rs-0.5.8-cp37-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f74e53fad6fb0d3a32bb332ebccec89acdf333fc3efe1f1b6e7d4d8d3e65cc6 |
|
MD5 | 19f95d3db77d9ed90a5541339c621a76 |
|
BLAKE2b-256 | 99f17b28ce2a23b9b91423e0e97b4ded8b2b74031b7b30e6bfd6ca1ca0df8c35 |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1a93dfcee1db85b43c572db740a1bb7f7fecfbabe2f432eacdb6a511e1576e1 |
|
MD5 | 5de77145c8e99cbb1fd983d04f52391c |
|
BLAKE2b-256 | 40125c80e2bb7b4cc17446ea67cc0222f4c16f29ed0689f24d4851053709583d |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b02ca269ad8d4ff31fd05b1c6c13aad9278b675ed21435abd237269e5ff8b1bb |
|
MD5 | 48d5682eee6e5df33c27d7743338de15 |
|
BLAKE2b-256 | fed5e2f75222e497296b1cf419303f3fc2093d395ce5daf71874246fad15146c |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a8b8a1c81a1cc4850a7025a210432784da04c98d735436ed381338c05925f15 |
|
MD5 | dc9a54317a67b610e89bfec463bd2a00 |
|
BLAKE2b-256 | bc99a5b8051b2e96f927784ea2f416a8bbaea17cd2d63d53a33bae1de75796e1 |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ea53ae0e84f741529daa8fc7fbd6d5a9b77da3915bd5ba594dae0b1c990a548 |
|
MD5 | 1be3eb85e8018bb74242ce85997ea050 |
|
BLAKE2b-256 | e6b628e53ed19f1eab5bcaea318b30d0cfb8e8d7da3767681e79a0adef09c165 |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcc88c1f95487bd1bb7951ef0816f1046d95c21b94b821622aa2035e6df35c85 |
|
MD5 | 1275d049b5b82a0e7ac68cb2b2cbbb7b |
|
BLAKE2b-256 | da7927ab4ddaa37f7042273c0fc7ac6df7c7c8fcb5769d2601ed62dac62ed3c9 |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce3ac782bfef84c8570df6446a2c60b8eafd189dec87d0e9e9bea81c63f38519 |
|
MD5 | a6c524893bd399ccbeaa4c8607615f2b |
|
BLAKE2b-256 | 610e8315308550df4e091237592cd85fae317386b97e26915079dea7dd71b2ce |
Hashes for fastbloom_rs-0.5.8-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 663b22f1ade7cbc513cbb62745a7ffc9841da9e1a7b4efd4b7f98977a975ef6e |
|
MD5 | 929d6dfa3a08a42fbed3e4188bbcdde2 |
|
BLAKE2b-256 | b31d069d3e9061e2f742b55279257593758fc46fe515dd02ff73f94789935a68 |