A space-efficient probabilistic data structure for membership testing.
Project description
Bloom Filter in Python
Table of Contents
Introduction
A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It allows false positives but never false negatives. This makes it ideal for applications where memory efficiency is crucial, such as caching, networking, and databases.
This implementation is optimized for configurability and performance, allowing users to specify the expected number of elements and the desired false positive probability.
Theory
A Bloom Filter consists of a bit array of size m and uses k different hash functions. When an element is inserted, all k hash functions generate indices in the bit array, and the corresponding bits are set to 1.
To check if an element is present:
- Compute its
khash values. - If all bits at those positions are 1, the element may be present (with a certain probability of false positives).
- If at least one bit is 0, the element is definitely not present.
Mathematical Formulas
-
Optimal bit array size:
$$m = - \frac{n \log p}{(\log 2)^2}$$
where ( n ) is the number of expected elements and ( p ) is the false positive rate. -
Optimal number of hash functions:
$$k = \frac{m}{n} \log 2$$
Installation
To install the Bloom Filter package, run:
pip install bloomfilter-lite
Installation from source
-
Clone this repository:
git clone https://github.com/lorenzomaiuri-dev/bloomfilter-py.git cd bloomfilter-py
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows
-
Install the dependencies
pip install -r requirements.txt
Usage
Basic Example
from bloomfilter_lite import BloomFilter
# Create a Bloom Filter for 1000 expected elements with a 1% false positive rate
bf = BloomFilter(expected_items=1000, false_positive_rate=0.01)
# Add elements
bf.add("hello")
bf.add("world")
# Check for membership
print(bf.check("hello")) # True (probably)
print(bf.check("python")) # False (definitely not present)
Benchmark
Performance testing for different dataset sizes:
| Elements | False Positive Rate | Memory (bits) | Time per Insert (ms) | Time per Lookup (ms) |
|---|---|---|---|---|
| 1,000 | 1% | ~9.6 KB | 0.01 | 0.008 |
| 10,000 | 1% | ~96 KB | 0.015 | 0.010 |
| 100,000 | 1% | ~960 KB | 0.020 | 0.012 |
Reproducing Benchmarks
To verify the benchmarks, run the following script:
python benchmarks/run_benchmark.py
This script tests insertions and lookups for varying dataset sizes and prints the execution time and memory usage.
Running Tests
To run the unit tests using pytest:
pytest tests/
Contributing
Contributions are welcome! If you'd like to contribute to this project, please follow these steps:
- Fork the repository
- Create a new branch (git checkout -b feature/your-feature)
- Commit your changes (git commit -am 'Add new feature')
- Push the branch (git push origin feature/your-feature)
- Open a Pull Request
Please ensure all pull requests pass the existing tests and include new tests for any added functionality
License
This project is licensed under the MIT License. See the LICENSE file for more details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bloomfilter_lite-1.0.7.tar.gz.
File metadata
- Download URL: bloomfilter_lite-1.0.7.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa17b5eb6edec103fcb738b718a6f6fb22c6a521e73f370e534e75fed064e0b3
|
|
| MD5 |
9a76645780371b0ccdfeb5db38091503
|
|
| BLAKE2b-256 |
74eddf2ef53853b783a722ea54e1c05cf25356863762cbfa605499ee65b79313
|
File details
Details for the file bloomfilter_lite-1.0.7-py3-none-any.whl.
File metadata
- Download URL: bloomfilter_lite-1.0.7-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
764c460308b154609e1add0deefe53facbd7ae4ddd8130a72760a3ee8ccd2973
|
|
| MD5 |
fb00857c3617be8c3a2685d633d191c8
|
|
| BLAKE2b-256 |
8679b0b686d256f8937eea1022b3915deb92aaa3623a57b85fa24835f41c9398
|