A python package for Learned Bloom Filters
Project description
learnedbf
A python package for Learned Bloom Filters
learnedbf is a Python package for Learned Bloom Filters (LBF), intended as
Bloom Filters learned from
data, as originally proposed by
Kraska et al., 2018.
This page provides a quick start guide. For more comprehensive information, please refer to the documentation.
Installation
pip install learnedbf
Usage
Library import
The following code imports all libraries used in the subsequent snippets.
>>> import numpy as np
>>>
>>> from sklearn.datasets import make_classification
>>> from sklearn.metrics import accuracy_score
>>> from sklearn.model_selection import train_test_split
>>>
>>> import learnedbf as lbf
>>> from learnedbf.classifiers import ScoredLinearSVC, ScoredMLP
>>> from learnedbf import complexity_measures as cpl
Evaluating the complexity of a dataset
The following code generates datasets of decrerasing complexity using the
make_classification function available in Scikit-learn, evaluating for each
the corresponding F1v measure.
>>> f1v = cpl.F1v()
>>>
>>> sep = np.linspace(0.001, 1.5, 10)
>>> for s in sep:
... X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
... class_sep=s)
... c = f1v.compute(X, y)
... print(f'class separation {s:.2f}, F1V {c:.2f}')
...
class separation 0.00, F1V 1.00
class separation 0.17, F1V 0.86
class separation 0.33, F1V 0.59
class separation 0.50, F1V 0.33
class separation 0.67, F1V 0.27
class separation 0.83, F1V 0.20
class separation 1.00, F1V 0.11
class separation 1.17, F1V 0.11
class separation 1.33, F1V 0.10
class separation 1.50, F1V 0.08
Training classifiers
The following code generates the dataset used in the rest of the examples, dividing it in three splits. The first two ones will be used for training a classifier and evaluating its performance; the third one will be used afterwards to estimate the FPR of the built filters.
>>> X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
>>> class_sep=0.5)
>>> y = y.astype(bool)
>>> X_build, X_evaluate, y_build, y_evaluate = train_test_split(X, y,
... test_size=0.1)
>>> X_train, X_test, y_train, y_test = train_test_split(X_build, y_build,
... test_size=0.1)
The following code trains a linear SVC and a multi-layer perceptron, comparing their performance on the test set using accuracy.
>>> svc = ScoredLinearSVC()
>>> svc.fit(X_train, y_train)
>>>
>>> mlp = ScoredMLP()
>>> mlp.fit(X_train, y_train)
>>>
>>> threshold = 0.65
>>>
>>> svc_pred = (svc.predict_score(X_test) > threshold).astype(int)
>>> mlp_pred = (mlp.predict_score(X_test) > threshold).astype(int)
>>>
>>> svc_score = accuracy_score(y_test, svc_pred)
>>> mlp_score = accuracy_score(y_test, mlp_pred)
>>>
>>> print(f'SVC score = {svc_score:.2f}, MLP score = {mlp_score:.2f}')
SVC score = 0.56, MLP score = 0.85
Building a learned Bloom filter
The following code builds a LBF using the previously learned MLP, and estimates its empirical FPR.
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009
The following code builds a LBF backed by a multi-layer perceptron, now training the latter on the provided data.
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size=0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7239171235297011)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009
The following code repeats the previous operation, now also performing a model selection on the learning rate of the multi-layer perceptron.
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
>>>
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp,
... threshold_test_size=0.2,
... hyperparameters={
... 'learning_rate_init':[0.01, 0.005, 0.001, 0.0005]})
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), hyperparameters={'learning_rate_init': [0.01, 0.005, 0.001, 0.0005]}, threshold=0.741263017250332)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009
License
The library is distributed under the Apache 2.0 license. This project includes portions of code from [FastPLBF] (https://github.com/atsukisato/FastPLBF), under the MIT License. See THIRD_PARTY_LICENSES for details.
Authors
learnedbf has been designed and implemented by D. Malchiodi, M. Frasca, N. Rinaldi and R. Giancarlo.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file learnedbf-1.0.0.tar.gz.
File metadata
- Download URL: learnedbf-1.0.0.tar.gz
- Upload date:
- Size: 38.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c78d2e8bde9ab743ac266f8570939879b2da1aef585e517871557a06eabf21b0
|
|
| MD5 |
f74d9dffe9963d7e3b718ced7956fd68
|
|
| BLAKE2b-256 |
d42cb8eb4d1b3405bea61a445918eade3b1cdb837c53401834bb80382e9390e1
|
Provenance
The following attestation bundles were made for learnedbf-1.0.0.tar.gz:
Publisher:
python-publish.yml on SLIMlaboratory/learnedbf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
learnedbf-1.0.0.tar.gz -
Subject digest:
c78d2e8bde9ab743ac266f8570939879b2da1aef585e517871557a06eabf21b0 - Sigstore transparency entry: 1051096808
- Sigstore integration time:
-
Permalink:
SLIMlaboratory/learnedbf@db6a8b794999b20f4048253c320ed6c7cd5c39de -
Branch / Tag:
refs/heads/main - Owner: https://github.com/SLIMlaboratory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@db6a8b794999b20f4048253c320ed6c7cd5c39de -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file learnedbf-1.0.0-py3-none-any.whl.
File metadata
- Download URL: learnedbf-1.0.0-py3-none-any.whl
- Upload date:
- Size: 43.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68dec3c652478d84b4dadb4532675dee7278e5a76a9eb1f0ed3f05a30bfa4d92
|
|
| MD5 |
01fb186b116630edeba8e376544c8d1f
|
|
| BLAKE2b-256 |
aafac166801bfce5848b1306fb59e356f440c8fc979b66728f1d0efbdaa4c487
|
Provenance
The following attestation bundles were made for learnedbf-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on SLIMlaboratory/learnedbf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
learnedbf-1.0.0-py3-none-any.whl -
Subject digest:
68dec3c652478d84b4dadb4532675dee7278e5a76a9eb1f0ed3f05a30bfa4d92 - Sigstore transparency entry: 1051096984
- Sigstore integration time:
-
Permalink:
SLIMlaboratory/learnedbf@db6a8b794999b20f4048253c320ed6c7cd5c39de -
Branch / Tag:
refs/heads/main - Owner: https://github.com/SLIMlaboratory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@db6a8b794999b20f4048253c320ed6c7cd5c39de -
Trigger Event:
workflow_dispatch
-
Statement type: