Skip to main content

A python package for Learned Bloom Filters

Project description

learnedbf

Python version Actions Status Documentation Status PyPI License: Apache2.0

A python package for Learned Bloom Filters

learnedbf is a Python package for Learned Bloom Filters (LBF), intended as Bloom Filters learned from data, as originally proposed by Kraska et al., 2018.

This page provides a quick start guide. For more comprehensive information, please refer to the documentation.

Installation

pip install learnedbf

Usage

Library import

The following code imports all libraries used in the subsequent snippets.

>>> import numpy as np
>>> 
>>> from sklearn.datasets import make_classification
>>> from sklearn.metrics import accuracy_score
>>> from sklearn.model_selection import train_test_split 
>>> 
>>> import learnedbf as lbf
>>> from learnedbf.classifiers import ScoredLinearSVC, ScoredMLP
>>> from learnedbf import complexity_measures as cpl

Evaluating the complexity of a dataset

The following code generates datasets of decrerasing complexity using the make_classification function available in Scikit-learn, evaluating for each the corresponding F1v measure.

>>> f1v = cpl.F1v()
>>>
>>> sep = np.linspace(0.001, 1.5, 10)
>>> for s in sep:
...     X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
...                                class_sep=s)
...     c = f1v.compute(X, y)
...     print(f'class separation {s:.2f}, F1V {c:.2f}')
...
class separation 0.00, F1V 1.00
class separation 0.17, F1V 0.86
class separation 0.33, F1V 0.59
class separation 0.50, F1V 0.33
class separation 0.67, F1V 0.27
class separation 0.83, F1V 0.20
class separation 1.00, F1V 0.11
class separation 1.17, F1V 0.11
class separation 1.33, F1V 0.10
class separation 1.50, F1V 0.08

Training classifiers

The following code generates the dataset used in the rest of the examples, dividing it in three splits. The first two ones will be used for training a classifier and evaluating its performance; the third one will be used afterwards to estimate the FPR of the built filters.

>>> X, y = make_classification(n_samples=20000, n_features=2, n_redundant=0,
>>>                            class_sep=0.5)
>>> y = y.astype(bool)                           
>>> X_build, X_evaluate, y_build, y_evaluate = train_test_split(X, y,
...                                                             test_size=0.1)
>>> X_train, X_test, y_train, y_test = train_test_split(X_build, y_build,
...                                                     test_size=0.1)

The following code trains a linear SVC and a multi-layer perceptron, comparing their performance on the test set using accuracy.

>>> svc = ScoredLinearSVC()
>>> svc.fit(X_train, y_train)
>>> 
>>> mlp = ScoredMLP()
>>> mlp.fit(X_train, y_train)
>>> 
>>> threshold = 0.65
>>> 
>>> svc_pred = (svc.predict_score(X_test) > threshold).astype(int)
>>> mlp_pred = (mlp.predict_score(X_test) > threshold).astype(int)
>>> 
>>> svc_score = accuracy_score(y_test, svc_pred)
>>> mlp_score = accuracy_score(y_test, mlp_pred)
>>> 
>>> print(f'SVC score = {svc_score:.2f}, MLP score = {mlp_score:.2f}')
SVC score = 0.56, MLP score = 0.85

Building a learned Bloom filter

The following code builds a LBF using the previously learned MLP, and estimates its empirical FPR.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

The following code builds a LBF backed by a multi-layer perceptron, now training the latter on the provided data.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7520010828581488)
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size=0.2)
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), threshold=0.7239171235297011)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

The following code repeats the previous operation, now also performing a model selection on the learning rate of the multi-layer perceptron.

>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp, threshold_test_size = 0.2)
>>> filter.fit(X_build, y_build)
>>> 
>>> mlp = ScoredMLP()
>>> filter = lbf.LBF(epsilon=0.01, classifier=mlp,
...                  threshold_test_size=0.2,
...                  hyperparameters={
...                      'learning_rate_init':[0.01, 0.005, 0.001, 0.0005]})
>>> filter.fit(X_build, y_build)
LBF(epsilon=0.01, classifier=ScoredMLP(), hyperparameters={'learning_rate_init': [0.01, 0.005, 0.001, 0.0005]}, threshold=0.741263017250332)
>>> print(f"FPR:{filter.estimate_FPR(X_evaluate[y_evaluate==0]):.3f}")
FPR:0.009

License

The library is distributed under the Apache 2.0 license. This project includes portions of code from [FastPLBF] (https://github.com/atsukisato/FastPLBF), under the MIT License. See THIRD_PARTY_LICENSES for details.

Authors

learnedbf has been designed and implemented by D. Malchiodi, M. Frasca, N. Rinaldi and R. Giancarlo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

learnedbf-1.0.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

learnedbf-1.0.0-py3-none-any.whl (43.9 kB view details)

Uploaded Python 3

File details

Details for the file learnedbf-1.0.0.tar.gz.

File metadata

  • Download URL: learnedbf-1.0.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for learnedbf-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c78d2e8bde9ab743ac266f8570939879b2da1aef585e517871557a06eabf21b0
MD5 f74d9dffe9963d7e3b718ced7956fd68
BLAKE2b-256 d42cb8eb4d1b3405bea61a445918eade3b1cdb837c53401834bb80382e9390e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for learnedbf-1.0.0.tar.gz:

Publisher: python-publish.yml on SLIMlaboratory/learnedbf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file learnedbf-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: learnedbf-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 43.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for learnedbf-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68dec3c652478d84b4dadb4532675dee7278e5a76a9eb1f0ed3f05a30bfa4d92
MD5 01fb186b116630edeba8e376544c8d1f
BLAKE2b-256 aafac166801bfce5848b1306fb59e356f440c8fc979b66728f1d0efbdaa4c487

See more details on using hashes here.

Provenance

The following attestation bundles were made for learnedbf-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on SLIMlaboratory/learnedbf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page