Fast Bloom Filter

## Less Hash Bloom Filter Less Hash Bloom Filter is fast bloom filter suitable for Big Data.

The computation of hash functions and checking the existence of an element is a major computation overhead. Also, bloom filter requires multiple independent hash functions, and well-designed hash functions are computation-intensive like MD5, SHA-1 .

In this implementation, we use a different technique to generate the k hash functions from only two. Therefore, the bloom filter is fast.

### Install

Install Less Hash Bloom Filter with pip as follows:

``````\$ pip install LessHash-BloomFilter
``````

### Usage

LHBF needs to know the size of bloom filter `m` and number of hash functions `k`.

Note: You should use high `m` to avoid the collision of hash functions. The probability of two random strings colliding is ~ 1/m

```from lhbf import BloomFilter

# Create a bloom filter
bf = BloomFilter(m=200, k=2)

# Check if element exists
bf.might_contain("a")

# Estimate flase positive probability
bf.estimate_fpp()

# Combine two bloom filters
bf2 = BloomFilter(m=200, k=2)
bf.combine(bf2)
```

### Details

• Hash functions used:

• For integer, we use Knuth multiplicative hash 
• For string, we use polynomial rolling hash function 
• k hash functions:

Using two hash functions, we calculate the k hash functions as follows:

gi(x) = h1(x) + i x h2(x) mod m, where 0 ≤ i ≤ k-1

It has been proved that using this method does not increase the asymptotic false positive probability .

### Contributing

You're welcome to submit pull requests with any changes for this repository at any time. I'll be very glad to see any contributions.

### References

•  Luo, Lailong, et al. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials (2018).
•  Knuth, Donald Ervin. The art of computer programming: sorting and searching. Vol. 3. Pearson Education, 1997.
•  Karp, Richard M., and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM journal of research and development 31.2 (1987): 249-260.
•  Kirsch, Adam, and Michael Mitzenmacher. Less hashing, same performance: building a better bloom filter. European Symposium on Algorithms. Springer, Berlin, Heidelberg, 2006.

## Project details

This version 0.0.5 0.0.4 0.0.3 0.0.2