Skip to main content

Active Learning Pipelines Benchmark

Project description

License Coverage Status Tests Read the Docs

PyPI Version PyPI status Code Style

ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data

ALPBench is a Python package for the specification, execution, and performance monitoring of active learning pipelines (ALP) consisting of a learning algorithm and a query strategy for real-world tabular classification tasks. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. However, the benchmark allows for easy extension such as implementing your own learning algorithm and/or query strategy and benchmark it against existing approaches.

🛠️ Install

ALPBench is intended to work with Python 3.10 and above.

# The base package can be installed via pip:
pip install alpbench

# Alternatively, you can install the full package via pip:
pip install alpbench[full]

# Or you can install the package from source:
git clone https://github.com/ValentinMargraf/ActiveLearningPipelines.git
cd ActiveLearningPipelines
conda create --name alpbench python=3.10
conda activate alpbench

# Install for usage (without TabNet and TabPFN)
pip install -r requirements.txt

# Install for usage (with TabNet and TabPFN)
pip install -r requirements_full.txt

Documentation at https://activelearningpipelines.readthedocs.io/en/latest/

⭐ Quickstart

You can use ALPBench in different ways. There already exist quite some learners and query strategies that can be run through accessing them with their name, as can be seen in the minimal example below. In the ALP.pipeline module you can also implement your own (new) query strategies.

📈 Fit an Active Learning Pipeline

Fit an ALP on dataset with openmlid 31, using a random forest and margin sampling. You can find similar example code snippets in examples/.

from sklearn.metrics import accuracy_score

from alpbench.benchmark.BenchmarkConnector import DataFileBenchmarkConnector
from alpbench.evaluation.experimenter.DefaultSetup import ensure_default_setup
from alpbench.pipeline.ALPEvaluator import ALPEvaluator

# create benchmark connector and establish database connection
benchmark_connector = DataFileBenchmarkConnector()

# load some default settings and algorithm choices
ensure_default_setup(benchmark_connector)

evaluator = ALPEvaluator(benchmark_connector=benchmark_connector,
                         setting_name="small", openml_id=31, query_strategy_name="margin", learner_name="rf_gini")
alp = evaluator.fit()

# fit / predict and evaluate predictions
X_test, y_test = evaluator.get_test_data()
y_hat = alp.predict(X=X_test)
print("final test acc", accuracy_score(y_test, y_hat))

>> final
test
acc
0.7181818181818181

Changelog

v0.1.0 (2024-06-13)

Initial release

  • pipeline can be used to combine learning algorithms and query strategies into active learning pipelines
  • evaluation provides tools to evaluate active learning pipelines
  • benchmark monitors the performance of active learning pipelines over time and store results in a database

v0.1.1 (2024-06-14)

  • extra code for tabnet does no longer need to be included from the repo

v0.1.2 (2024-07-10)

  • added plotting functionality to evaluation to generate budget curves
  • notebook in docs explains how to extract results from previously run experiments and plot budget curves

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alpbench-0.1.2.tar.gz (90.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alpbench-0.1.2-py3-none-any.whl (106.2 kB view details)

Uploaded Python 3

File details

Details for the file alpbench-0.1.2.tar.gz.

File metadata

  • Download URL: alpbench-0.1.2.tar.gz
  • Upload date:
  • Size: 90.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for alpbench-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e90884afc979c0e8e560bc20d176415e5155bcb5790f2c05c723a627eb543189
MD5 07c9351a36d1a2c1f0d35a6ea5bcf1fa
BLAKE2b-256 472e9f184459a6771fe09eb31ddd64e7d372ad34d96192b43245d78abb0320bd

See more details on using hashes here.

File details

Details for the file alpbench-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: alpbench-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 106.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for alpbench-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e7afff22409a3b1fe9606c49edd637cef4968baac1929fb9dc5fa12e52141066
MD5 1efe5e23691e8047e63696687b970b12
BLAKE2b-256 66a30e4fd16d665957b47180d5539c58b17c4dfce268b1ea11a39f8482abfbda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page