Skip to main content

Some fast bloom filter implemented by Rust for Python and Rust! 10x faster than pybloom!

Project description

fastbloom

OSCS Status docs.rs Test Rust Test Python Benchmark Crates Latest Release PyPI Latest Release Sonatype Nexus (Snapshots)

A fast bloom filter | counting bloom filter implemented by Rust for Rust and Python!

Language: 简体中文

setup

Python

requirements

Python >= 3.7

install

Install the latest fastbloom version with:

pip install fastbloom-rs

Rust

fastbloom-rs = "{latest}"

Java

maven

<dependency>
    <groupId>io.github.yankun1992</groupId>
    <artifactId>fastbloom</artifactId>
    <version>{latest-version}</version>
</dependency>

Examples

BloomFilter

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.

Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. Full text article

Python

basic usage

from fastbloom_rs import BloomFilter

bloom = BloomFilter(100_000_000, 0.01)

bloom.add_str('hello')
bloom.add_bytes(b'world')
bloom.add_int(9527)

assert bloom.contains('hello')
assert bloom.contains(b'world')
assert bloom.contains(9527)

assert not bloom.contains('hello world')

build bloom filter from bytes or list

from fastbloom_rs import BloomFilter

bloom = BloomFilter(100_000_000, 0.01)
bloom.add_str('hello')
assert bloom.contains('hello')

bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
assert bloom2.contains('hello')

bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
assert bloom3.contains('hello')

there are some bulk api for python to reduce ffi cost between python and rust

bloom = BloomFilter(100_000_000, 0.01)
inserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]
checks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]
results = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]

bloom.add_int_batch(inserts)
contains = bloom.contains_int_batch(checks)
assert contains == results

bloom.add_str_batch(list(map(lambda x: str(x), inserts)))
assert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results

bloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))
assert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results

more examples at py_tests.

Rust

use fastbloom_rs::{BloomFilter, FilterBuilder};

let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();

bloom.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);
assert_eq!(bloom.contains(b"helloworld!"), false);

more examples at docs.rs

CountingBloomFilter

A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.

Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006

Python

from fastbloom_rs import CountingBloomFilter

cbf = CountingBloomFilter(1000_000, 0.01)
cbf.add('hello')
cbf.add('hello')
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' in cbf  # because 'hello' added twice. 
# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
cbf.remove('hello')
assert 'hello' not in cbf

A CountingBloomFilter has a four bits counter to save hash index, so when insert an element repeatedly, the counter will spill over quickly. So, you can set enable_repeat_insert to False to check whether the element has added. if it has added, it will not add again. enable_repeat_insert default set to True.

from fastbloom_rs import CountingBloomFilter

cbf = CountingBloomFilter(1000_000, 0.01, False)
cbf.add('hello')
cbf.add('hello')  # because enable_repeat_insert=False, this addition will not take effect. 
assert 'hello' in cbf
cbf.remove('hello')
assert 'hello' not in cbf 

more examples at py_tests.

Rust

use fastbloom_rs::{CountingBloomFilter, FilterBuilder};

let mut builder = FilterBuilder::new(100_000, 0.01);
let mut cbf = builder.build_counting_bloom_filter();
cbf.add(b"helloworld");
assert_eq!(bloom.contains(b"helloworld"), true);

benchmark

computer info

CPU Memory OS
AMD Ryzen 7 5800U with Radeon Graphics 16G Windows 10

add one str to bloom filter

Benchmark insert one str to bloom filter:

bloom_add_test          time:   [41.168 ns 41.199 ns 41.233 ns]
                        change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

add one million to bloom filter

Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() }) to bloom filter:

bloom_add_all_test      time:   [236.24 ms 236.86 ms 237.55 ms]
                        change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

check one contains in bloom filter

bloom_contains_test     time:   [42.065 ns 42.102 ns 42.156 ns]
                        change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  9 (9.00%) high severe

check one not contains in bloom filter

bloom_not_contains_test time:   [22.695 ns 22.727 ns 22.773 ns]
                        change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

add one str to counting bloom filter

counting_bloom_add_test time:   [60.822 ns 60.861 ns 60.912 ns]
                        change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

add one million to counting bloom filter

Benchmark loop insert (1..1_000_000).map(|n| { n.to_string() }) to counting bloom filter:

counting_bloom_add_million_test
                        time:   [272.48 ms 272.58 ms 272.68 ms]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastbloom_rs-0.5.10-cp37-abi3-win_amd64.whl (189.9 kB view details)

Uploaded CPython 3.7+Windows x86-64

fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_34_x86_64.whl (282.1 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.34+ x86-64

fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (408.1 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ppc64le

fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl (418.6 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ppc64

fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (283.4 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARMv7l

fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (269.4 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARM64

fastbloom_rs-0.5.10-cp37-abi3-macosx_11_0_arm64.whl (251.5 kB view details)

Uploaded CPython 3.7+macOS 11.0+ ARM64

fastbloom_rs-0.5.10-cp37-abi3-macosx_10_12_x86_64.whl (261.3 kB view details)

Uploaded CPython 3.7+macOS 10.12+ x86-64

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 dc18c0ed6bd800dbd54617b51acbc977869975d5f87577cedec5fb6b653f24d6
MD5 5b0707227ad40930ca5e36e51d652f3f
BLAKE2b-256 c892c949d09b1988321a40b915a8ef7cdd0b5dfba4d1510471239fccae8ecfbd

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1cd58d7eac2dceec8055583e719ceac6edceae80c3abcb53110391b76e4cff47
MD5 f87559cb8b4e930f0eeff65e07355f8d
BLAKE2b-256 e0981224a402f89cb89cd6a0918e55d68d5c1aff8f75e93d64d00990f600d2c6

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 3737d3bfe98f2eb6009292a09f7329fa2f360f196d3e4bdafd595b97b9834610
MD5 9abaa1c152a3f7f408268582a2b539cc
BLAKE2b-256 c5b1dc24bb80680278458a227e70eff75ed2c0faec9a84c8bbbbb46fda1cd42a

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl
Algorithm Hash digest
SHA256 8656de9a5589b9e3700eb576deaa2d4e6190386debf2dc79c598749bbc72c119
MD5 0f42a62dcfbf149cce24b377882337d9
BLAKE2b-256 b20beec1e563f14e379bec48719c2cb70a4849a2bdf04c39be0fb09d29f57066

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 fe695abd15e772f6323602fd2c1b52ddcb13e3f56d2ec766085149adcdd030d2
MD5 682cdc408a68dc163759e9ea4628bafc
BLAKE2b-256 deb7dcd455ab8c1678f83415bdd8dbd9b62cca681390468075b78a480211ddbf

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7568bc4c2be81f908a20ae467dfb47456535bf90036c93bb87a7a89949f6b4cf
MD5 8b177a6f74016adaedd01d92d2136583
BLAKE2b-256 ffafaac0dd1f503dbdc8f9b9801b4b1aede04b31282b6351aa46029d52e639bd

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a9026c4e1cf496f0c23bbfb782881b8a7afde2c81d7159ecc39fc4c5017223c9
MD5 6b230281ab5e27bb0d8de6bceab8f9a3
BLAKE2b-256 37d6adf8328ad7618217b45200987f5d09ec2c703848f5174673a4000afcb81a

See more details on using hashes here.

File details

Details for the file fastbloom_rs-0.5.10-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fastbloom_rs-0.5.10-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 723b7e96df0af56c9e0c57a3de8402db520d6379fac8588c2c31b0426e579791
MD5 448a0bcd6343ce4b6554d76c31148e50
BLAKE2b-256 bfa102f68de112876e230fa359bdc7f5c4d00457730f07df6937cb28038fc790

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page