Some fast bloom filter implemented by Rust for Python and Rust! 10x faster than pybloom!
Project description
fastbloom
A fast bloom filter | counting bloom filter implemented by Rust for Rust and Python!
Language: 简体中文
setup
Python
requirements
Python >= 3.7
install
Install the latest fastbloom version with:
pip install fastbloom-rs
Rust
fastbloom-rs = "{latest}"
Java
maven
<dependency>
<groupId>io.github.yankun1992</groupId>
<artifactId>fastbloom</artifactId>
<version>{latest-version}</version>
</dependency>
Examples
BloomFilter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.
Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. Full text article
Python
basic usage
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
bloom.add_bytes(b'world')
bloom.add_int(9527)
assert bloom.contains('hello')
assert bloom.contains(b'world')
assert bloom.contains(9527)
assert not bloom.contains('hello world')
build bloom filter from bytes or list
from fastbloom_rs import BloomFilter
bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
assert bloom.contains('hello')
bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
assert bloom2.contains('hello')
bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
assert bloom3.contains('hello')
there are some bulk api for python to reduce ffi cost between python and rust
bloom = BloomFilter(100_000_000, 0.01)
inserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]
checks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]
results = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]
bloom.add_int_batch(inserts)
contains = bloom.contains_int_batch(checks)
assert contains == results
bloom.add_str_batch(list(map(lambda x: str(x), inserts)))
assert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results
bloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))
assert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results
more examples at py_tests.
Rust
use fastbloom_rs::{BloomFilter, FilterBuilder};
let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();
bloom.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
assert_eq!(bloom.contains(b"helloworld!"), false);
more examples at docs.rs
CountingBloomFilter
A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.
Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006
Python
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01)
cbf.add('hello')
cbf.add('hello')
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' in cbf # because 'hello' added twice.
# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
cbf.remove('hello')
assert 'hello' not in cbf
A CountingBloomFilter has a four bits counter to save hash index, so when insert an
element repeatedly, the counter will spill over quickly. So, you can set
enable_repeat_insert
to False
to check whether the element has added.
if it has added, it will not add again. enable_repeat_insert
default set to True
.
from fastbloom_rs import CountingBloomFilter
cbf = CountingBloomFilter(1000_000, 0.01, False)
cbf.add('hello')
cbf.add('hello') # because enable_repeat_insert=False, this addition will not take effect.
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' not in cbf
more examples at py_tests.
Rust
use fastbloom_rs::{CountingBloomFilter, FilterBuilder};
let mut builder = FilterBuilder::new(100_000, 0.01);
let mut cbf = builder.build_counting_bloom_filter();
cbf.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
benchmark
computer info
CPU | Memory | OS |
---|---|---|
AMD Ryzen 7 5800U with Radeon Graphics | 16G | Windows 10 |
add one str to bloom filter
Benchmark insert one str to bloom filter:
bloom_add_test time: [41.168 ns 41.199 ns 41.233 ns]
change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) high mild
12 (12.00%) high severe
add one million to bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to bloom filter:
bloom_add_all_test time: [236.24 ms 236.86 ms 237.55 ms]
change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
check one contains in bloom filter
bloom_contains_test time: [42.065 ns 42.102 ns 42.156 ns]
change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe
check one not contains in bloom filter
bloom_not_contains_test time: [22.695 ns 22.727 ns 22.773 ns]
change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
add one str to counting bloom filter
counting_bloom_add_test time: [60.822 ns 60.861 ns 60.912 ns]
change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
add one million to counting bloom filter
Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() })
to counting bloom filter:
counting_bloom_add_million_test
time: [272.48 ms 272.58 ms 272.68 ms]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for fastbloom_rs-0.5.6-cp37-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2f9e3b6af48624da1962c741507262ed49e46e1621d980fefbc67f38f7840c2 |
|
MD5 | 194bf4ce7d85cc85b76c62e401620f9d |
|
BLAKE2b-256 | 2c080a67bd0b490f627b95403d5d482609d37b677cfac1a4bf074881a50be72f |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57ea564e242cc20127cb6ee8e2c6e76f0c9444ec2d646b6fd54f4b4b13dda9fd |
|
MD5 | 092bd658c53bfdca73afad9d349ed1b1 |
|
BLAKE2b-256 | a6572703962869f35ab9be80725573f7b73765ea89f901d06bf86116ede0563f |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d0249ce6f3d7cbccddafac19a1d534b74cd004467c91b78b04a3e9819e65ab7 |
|
MD5 | 1581ac06f1185bfc2c9e3c8d8f500b2e |
|
BLAKE2b-256 | 43dd77c14c183c4686654578c44e2cc1d9bea504c27a5ed87bec5df55b642f99 |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fc0327c49a7d0314e4d3540610519878707f8ff431b08edf5e7276397743847 |
|
MD5 | 96b57e8b38f33e4266768d1234764b2e |
|
BLAKE2b-256 | ca030122c8a3deb9d1dfadc256b05a919e9a1cc4482e5cfa11c3808a0a539ae5 |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd72b2c607d6555442065fdfa970446e3b18083fb82ab0a5089d40dad694857d |
|
MD5 | 0390186c738055d541e52b1f7e56a761 |
|
BLAKE2b-256 | 8ef68f0806ea7d1bc3a9c48943ebd427fb629e4cfd9d351bf8989c4abc309b19 |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a913891e2ea4ba3839032dc23b6b03bebf30afe1beaa60ee904903aaff886c6e |
|
MD5 | 3e5f9457d9662beedf322a5cc8f2eb29 |
|
BLAKE2b-256 | 186979a914b96cc39cb11eec876d91049f27a2efbe2b2fc51064c4c702f8cdfa |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54daec91b514bda2e9e883eab2660990e6e1781a80d46a3ae1b77b2640cffae9 |
|
MD5 | b45248a4248802ae36cf7e8f454ebcac |
|
BLAKE2b-256 | dee03283d6da7b400609612b4ba6e6bc4075ff81c00afcd8ae98147c4dfa134f |
Hashes for fastbloom_rs-0.5.6-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45cd2297a1fc9d9c735d55adf6ba78e2974c355036262f0813b5f9271f6e940c |
|
MD5 | c3037b543566eb064c41c7bb7d2a0af8 |
|
BLAKE2b-256 | d03fe17d420d9082cb90e2b4da50dc579701cc2d2826d73cdeb8877b010fb298 |