A python package for Learned Bloom Filters

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dariomalchiodi

These details have not been verified by PyPI

Project links

Documentation

Project description

learnedbf

A python package for Learned Bloom Filters

learnedbf is a Python package for Learned Bloom Filters (LBF), intended as Bloom Filters learned from data, as originally proposed by Kraska et al., 2018.

This page provides a quick start guide. For more comprehensive information, please refer to the documentation.

Installation

pip install learnedbf

Usage

Library import

The following code imports all libraries used in the subsequent snippets.

>>> import numpy as np
>>> 
>>> from sklearn.datasets import make_classification
>>> from sklearn.metrics import accuracy_score
>>> from sklearn.model_selection import train_test_split 
>>> 
>>> import learnedbf as lbf
>>> from learnedbf.classifiers import ScoredLinearSVC, ScoredMLP
>>> from learnedbf import complexity_measures as cpl

Evaluating the complexity of a dataset

The following code generates datasets of decrerasing complexity using the make_classification function available in Scikit-learn, evaluating for each the corresponding F1v measure.

>>> f1v = cpl.F1v()
>>>
>>> sep = np.linspace(0.001, 1.5, 10)
>>> for s in sep:
...     X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
...                                class_sep=s)
...     c = f1v.compute(X, y)
...     print(f'class separation {s:.2f}, F1V {c:.2f}')
...
class separation 0.00, F1V 1.00
class separation 0.17, F1V 0.86
class separation 0.33, F1V 0.59
class separation 0.50, F1V 0.33
class separation 0.67, F1V 0.27
class separation 0.83, F1V 0.20
class separation 1.00, F1V 0.11
class separation 1.17, F1V 0.11
class separation 1.33, F1V 0.10
class separation 1.50, F1V 0.08

Training classifiers

The following code generates the dataset used in the rest of the examples, dividing it in three splits. The first two ones will be used for training a classifier and evaluating its performance; the third one will be used afterwards to estimate the FPR of the built filters.

>>> X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
>>>                            class_sep=0.5)
>>> y = y.astype(bool)                           
>>> X_build, X_evaluate, y_build, y_evaluate = train_test_split(X, y,
...                                                             test_size=0.1)
>>> X_train, X_test, y_train, y_test = train_test_split(X_build, y_build,
...                                                     test_size=0.1)

The following code trains a linear SVC and a multi-layer perceptron, comparing their performance on the test set using accuracy.

>>> svc = ScoredLinearSVC()
>>> svc.fit(X_train, y_train)
>>> 
>>> mlp = ScoredMLP()
>>> mlp.fit(X_train, y_train)
>>> 
>>> threshold = 0.65
>>> 
>>> svc_pred = (svc.predict_score(X_test) > threshold).astype(int)
>>> mlp_pred = (mlp.predict_score(X_test) > threshold).astype(int)
>>> 
>>> svc_score = accuracy_score(y_test, svc_pred)
>>> mlp_score = accuracy_score(y_test, mlp_pred)
>>> 
>>> print(f'SVC score = {svc_score:.2f}, MLP score = {mlp_score:.2f}')
SVC score = 0.56, MLP score = 0.85

Building a learned Bloom filter

The following code builds a LBF using the previously learned MLP, and estimates its empirical FPR.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

The following code builds a LBF backed by a multi-layer perceptron, now training the latter on the provided data.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size=0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7239171235297011)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

The following code repeats the previous operation, now also performing a model selection on the learning rate of the multi-layer perceptron.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
>>> 
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp,
...                  threshold_test_size=0.2,
...                  hyperparameters={
...                      'learning_rate_init':[0.01, 0.005, 0.001, 0.0005]})
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), hyperparameters={'learning_rate_init': [0.01, 0.005, 0.001, 0.0005]}, threshold=0.741263017250332)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

License

The library is distributed under the Apache 2.0 license. This project includes portions of code from [FastPLBF] (https://github.com/atsukisato/FastPLBF), under the MIT License. See THIRD_PARTY_LICENSES for details.

Authors

learnedbf has been designed and implemented by D. Malchiodi, M. Frasca, N. Rinaldi and R. Giancarlo.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dariomalchiodi

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

1.0.0

Mar 6, 2026

0.6.2

May 28, 2025

0.6.1

May 28, 2025

0.6

May 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

learnedbf-1.0.0.tar.gz (38.3 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

learnedbf-1.0.0-py3-none-any.whl (43.9 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file learnedbf-1.0.0.tar.gz.

File metadata

Download URL: learnedbf-1.0.0.tar.gz
Upload date: Mar 6, 2026
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for learnedbf-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c78d2e8bde9ab743ac266f8570939879b2da1aef585e517871557a06eabf21b0`
MD5	`f74d9dffe9963d7e3b718ced7956fd68`
BLAKE2b-256	`d42cb8eb4d1b3405bea61a445918eade3b1cdb837c53401834bb80382e9390e1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for learnedbf-1.0.0.tar.gz:

Publisher: python-publish.yml on SLIMlaboratory/learnedbf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: learnedbf-1.0.0.tar.gz
- Subject digest: c78d2e8bde9ab743ac266f8570939879b2da1aef585e517871557a06eabf21b0
- Sigstore transparency entry: 1051096808
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: SLIMlaboratory/learnedbf@db6a8b794999b20f4048253c320ed6c7cd5c39de
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SLIMlaboratory
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@db6a8b794999b20f4048253c320ed6c7cd5c39de
- Trigger Event: workflow_dispatch

File details

Details for the file learnedbf-1.0.0-py3-none-any.whl.

File metadata

Download URL: learnedbf-1.0.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 43.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for learnedbf-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68dec3c652478d84b4dadb4532675dee7278e5a76a9eb1f0ed3f05a30bfa4d92`
MD5	`01fb186b116630edeba8e376544c8d1f`
BLAKE2b-256	`aafac166801bfce5848b1306fb59e356f440c8fc979b66728f1d0efbdaa4c487`

See more details on using hashes here.

Provenance

The following attestation bundles were made for learnedbf-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on SLIMlaboratory/learnedbf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: learnedbf-1.0.0-py3-none-any.whl
- Subject digest: 68dec3c652478d84b4dadb4532675dee7278e5a76a9eb1f0ed3f05a30bfa4d92
- Sigstore transparency entry: 1051096984
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: SLIMlaboratory/learnedbf@db6a8b794999b20f4048253c320ed6c7cd5c39de
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SLIMlaboratory
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@db6a8b794999b20f4048253c320ed6c7cd5c39de
- Trigger Event: workflow_dispatch

learnedbf 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

learnedbf

Installation

Usage

Library import

Evaluating the complexity of a dataset

Training classifiers

Building a learned Bloom filter

License

Authors

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance