Skip to main content

Fast Bloom Filter

Project description

Less Hash Bloom Filter

Build Status

Less Hash Bloom Filter is fast bloom filter suitable for Big Data.

The computation of hash functions and checking the existence of an element is a major computation overhead. Also, bloom filter requires multiple independent hash functions, and well-designed hash functions are computation-intensive like MD5, SHA-1 [1].

In this implementation, we use a different technique to generate the k hash functions from only two. Therefore, the bloom filter is fast.

Install

Install Less Hash Bloom Filter with pip as follows:

$ pip install LessHash-BloomFilter

Usage

LHBF needs to know the size of bloom filter m and number of hash functions k.

Note: You should use high m to avoid the collision of hash functions. The probability of two random strings colliding is ~ 1/m

from lhbf import BloomFilter

# Create a bloom filter 
bf = BloomFilter(m=200, k=2)

# Add an element
bf.add("a")

# Check if element exists
bf.might_contain("a")

# Estimate flase positive probability 
bf.estimate_fpp()

# Combine two bloom filters
bf2 = BloomFilter(m=200, k=2)
bf.combine(bf2)

Details

  • Hash functions used:

    • For integer, we use Knuth multiplicative hash [2]
    • For string, we use polynomial rolling hash function [3]
  • k hash functions:

    Using two hash functions, we calculate the k hash functions as follows:

    gi(x) = h1(x) + i x h2(x) mod m, where 0 ≤ i ≤ k-1

    It has been proved that using this method does not increase the asymptotic false positive probability [4].

Contributing

You're welcome to submit pull requests with any changes for this repository at any time. I'll be very glad to see any contributions.

References

  • [1] Luo, Lailong, et al. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials (2018).
  • [2] Knuth, Donald Ervin. The art of computer programming: sorting and searching. Vol. 3. Pearson Education, 1997.
  • [3] Karp, Richard M., and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM journal of research and development 31.2 (1987): 249-260.
  • [4] Kirsch, Adam, and Michael Mitzenmacher. Less hashing, same performance: building a better bloom filter. European Symposium on Algorithms. Springer, Berlin, Heidelberg, 2006.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LessHash-BloomFilter-0.0.5.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file LessHash-BloomFilter-0.0.5.tar.gz.

File metadata

  • Download URL: LessHash-BloomFilter-0.0.5.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for LessHash-BloomFilter-0.0.5.tar.gz
Algorithm Hash digest
SHA256 a2ff6906ec78dbe3106e803ba2f2a724cb80e578cf3aaaf94ee8362ab59cefeb
MD5 0416ad2e322e36317a6c800967ed6813
BLAKE2b-256 078cc47a53b548b4ea3099ccc951faead48f00f54224c4350630ae208f80afea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page