Streaming AutoML Benchmark — fast C++ base algorithms + unified AutoML frameworks
Project description
A unified benchmark framework for evaluating AutoML systems on data streams with fast C++ base algorithms and rigorous prequential evaluation.
Why SAMLB?
Streaming AutoML methods are hard to compare fairly. Different papers use different datasets, evaluation protocols, and algorithm pools. SAMLB solves this by providing:
- Fast C++ base algorithms with River-compatible Python interfaces (Naive Bayes, Hoeffding Trees, KNN, Perceptron, Logistic Regression, and more)
- Framework-agnostic benchmarking -- plug in any streaming AutoML method with just 3 methods
- Standardized prequential evaluation (test-then-train) with windowed metric snapshots for learning curves
- 30 curated datasets (15 classification + 15 regression) spanning real-world and synthetic drift scenarios
- Parallel execution for large-scale experiments across multiple seeds
Installation
From PyPI
pip install samlb
From source
git clone https://github.com/TechyNilesh/samlb.git
cd samlb
pip install -e ".[dev]"
Optional: Vowpal Wabbit support (for ChaCha regressor)
pip install "samlb[vw]"
Requirements: Python >= 3.9, a C++ compiler (for the native extension), CMake
Quick Start
Python API
from samlb.benchmark import BenchmarkSuite
from samlb.framework.classification.asml import AutoStreamClassifier
from samlb.framework.classification.eaml import EvolutionaryBaggingClassifier
suite = BenchmarkSuite(
models={
"ASML": AutoStreamClassifier(seed=42),
"EvoAutoML": EvolutionaryBaggingClassifier(seed=42),
},
datasets=["electricity", "covertype"],
task="classification",
n_runs=10,
window_size=1000,
)
suite.run()
suite.print_table()
suite.to_csv("results/classification.csv")
Dataset Streaming
from samlb.datasets import stream, list_datasets
# See all available datasets
print(list_datasets("classification"))
print(list_datasets("regression"))
# Stream instance by instance
for x, y in stream("electricity", task="classification"):
pred = model.predict_one(x)
model.learn_one(x, y)
CLI
# Full classification benchmark (4 frameworks x 15 datasets x 10 runs)
python examples/run_benchmark.py
# Custom subset
python examples/run_benchmark.py --n_runs 5 --max_samples 50000 --datasets electricity covertype
# Parallel execution across CPU cores
python examples/run_benchmark.py --n_runs 100 --parallel --cpu_utilization 0.8
# Regression benchmark
python examples/run_regression.py
python examples/run_regression.py --n_runs 5 --datasets bike california_housing
Included Frameworks
Classification
| Framework | Strategy | Key Features |
|---|---|---|
| ASML | Adaptive Random Drift Nearby Search | ADWIN drift detection, recency-weighted ensemble, adaptive budget |
| AutoClass | Genetic Algorithm + Meta-Regressor | Fitness-proportionate selection, ARF surrogate for HP mutation |
| EvoAutoML | Evolutionary Bagging | Population-based, tournament selection, Poisson(6) sampling |
| OAML | Drift-triggered Random Search | EDDM drift detector, warm-up phase, random search |
Regression
| Framework | Strategy | Key Features |
|---|---|---|
| ASML | Adaptive Random Drift Nearby Search | Online target normalization (Welford), prediction clipping |
| ChaCha | FLAML AutoVW | Vowpal Wabbit online HPO, progressive validation loss |
| EvoAutoML | Evolutionary Bagging | Population-based ensemble, mutation-driven search |
C++ Base Algorithms
All base learners are implemented in C++ for speed and wrapped with River-compatible interfaces:
Classification: Naive Bayes, Perceptron, Logistic Regression, Passive Aggressive, Softmax Regression, KNN, Hoeffding Tree, EFDT, SGT
Regression: Linear Regression, Bayesian Linear Regression, Passive Aggressive, Hoeffding Tree, KNN
Preprocessing (via River): MinMaxScaler, StandardScaler, MaxAbsScaler, VarianceThreshold, SelectKBest
Evaluation Methodology
SAMLB uses prequential evaluation (test-then-train):
- For each instance in the stream:
- Predict -- get the model's prediction before seeing the label
- Evaluate -- score the prediction against the true label
- Learn -- update the model with the labelled instance
- Metrics are captured at configurable window intervals for learning curve analysis
- Runtime is sampled per-instance for performance profiling
Classification metrics: Accuracy, Macro-F1, Macro-Precision, Macro-Recall
Regression metrics: MAE, RMSE, R^2
Datasets
Classification (15 datasets -- 2.5M+ total instances)
| Dataset | Samples | Features | Classes | Type | Description |
|---|---|---|---|---|---|
adult |
48,842 | 14 | 4 | Real | Income prediction (Census) |
covertype |
100,000 | 54 | 7 | Real | Forest cover type (cartographic) |
credit_card |
284,807 | 30 | 2 | Real | Credit card fraud detection |
electricity |
45,312 | 8 | 2 | Real | Electricity price direction (NSW, Australia) |
insects |
52,848 | 33 | 6 | Real | Insect species with concept drift |
new_airlines |
539,383 | 7 | 2 | Real | Flight delay prediction |
nomao |
34,465 | 118 | 2 | Real | Nomao place deduplication |
poker_hand |
1,025,009 | 10 | 10 | Real | Poker hand classification |
shuttle |
58,000 | 9 | 7 | Real | NASA Space Shuttle radiator |
vehicle_sensIT |
98,528 | 100 | 3 | Real | Vehicle type from seismic sensors |
movingRBF |
200,000 | 10 | 5 | Synthetic | Moving radial basis functions |
moving_squares |
200,000 | 2 | 4 | Synthetic | Moving class boundaries |
sea_high_abrupt_drift |
500,000 | 3 | 2 | Synthetic | SEA generator with abrupt drift |
synth_RandomRBFDrift |
100,000 | 4 | 4 | Synthetic | RBF generator with gradual drift |
synth_agrawal |
100,000 | 9 | 2 | Synthetic | Agrawal generator |
Regression (15 datasets -- 1M+ total instances)
| Dataset | Samples | Features | Type | Description |
|---|---|---|---|---|
ailerons |
13,750 | 40 | Real | Aircraft control surface deflection |
bike |
17,379 | 12 | Real | Bike sharing hourly demand |
california_housing |
20,640 | 8 | Real | California median house values |
cps88wages |
28,155 | 6 | Real | Wage prediction (CPS 1988) |
diamonds |
53,940 | 9 | Real | Diamond price prediction |
elevators |
16,599 | 18 | Real | Aircraft elevator control |
fifa |
19,178 | 28 | Real | FIFA player overall rating |
House8L |
22,784 | 8 | Real | House price (8-feature variant) |
kings_county |
21,613 | 21 | Real | King County house sales price |
MetroTraffic |
48,204 | 7 | Real | Interstate traffic volume (Minneapolis) |
superconductivity |
21,263 | 81 | Real | Superconductor critical temperature |
wave_energy |
72,000 | 48 | Real | Wave energy converter power output |
fried |
40,768 | 10 | Synthetic | Friedman function |
FriedmanGra |
100,000 | 10 | Synthetic | Friedman with gradual drift |
hyperA |
500,000 | 10 | Synthetic | Hyperplane with drift |
Output Formats
results/
classification_10runs.csv # Flat CSV: one row per (framework x dataset x run)
aggregate.json # Aggregated mean +/- std across runs
ASML_electricity_seed0.json # Per-run JSON with full learning curves
Project Structure
.
├── pyproject.toml # Package metadata & build config
├── CMakeLists.txt # C++ build configuration
├── LICENSE # MIT License
├── README.md # This file
├── _cpp/ # C++ source (9 classifiers, 5 regressors)
│ ├── classification/
│ ├── regression/
│ ├── core/ # Shared headers
│ └── bindings/ # PyBind11 module
├── samlb/ # Python package
│ ├── __init__.py # Version: 0.1.0
│ ├── algorithms/ # C++ algorithm Python bindings
│ ├── benchmark/ # BenchmarkSuite orchestrator
│ ├── evaluation/ # PrequentialEvaluator, metrics, results
│ ├── datasets/ # 30 datasets (15 clf + 15 reg NPZ files)
│ └── framework/ # AutoML framework implementations
│ ├── base/ # BaseStreamFramework + C++ wrappers
│ ├── classification/ # ASML, AutoClass, EvoAutoML, OAML
│ └── regression/ # ASML, ChaCha, EvoAutoML
├── tests/ # Test suite
└── examples/ # Benchmark runner scripts
├── run_benchmark.py # Classification benchmark CLI
└── run_regression.py # Regression benchmark CLI
Contributing
We welcome contributions! Whether you are adding a new AutoML framework, new datasets, or fixing bugs.
Development Setup
git clone https://github.com/TechyNilesh/samlb.git
cd samlb
pip install -e ".[dev]"
Running Tests
pytest tests/
Code Style
ruff check samlb/
ruff format samlb/
Adding a New Streaming AutoML Framework
This is the primary way to contribute. Every framework in SAMLB implements the same 3-method interface, making it easy to add your own.
Step 1 -- Create your framework directory
samlb/framework/classification/my_method/ # (or regression/)
__init__.py
model.py
config.py # optional: search space / hyperparameter config
Step 2 -- Implement BaseStreamFramework
# samlb/framework/classification/my_method/model.py
from __future__ import annotations
from typing import Any, Dict
from samlb.framework.base import BaseStreamFramework
class MyStreamingAutoML(BaseStreamFramework):
"""My new streaming AutoML method."""
def __init__(self, seed: int = 42, exploration_window: int = 1000, budget: int = 10):
self.seed = seed
self.exploration_window = exploration_window
self.budget = budget
self._init_state()
def predict_one(self, x: Dict[str, float]) -> Any:
"""
Return prediction for one instance BEFORE learning.
x : dict mapping feature_name -> float value
Returns: class label (int) for classification, value (float) for regression
"""
return self._current_model_predict(x)
def learn_one(self, x: Dict[str, float], y: Any) -> None:
"""
Update the model with one labelled instance.
This is where your AutoML logic lives:
- Update base learners
- Evaluate pipeline candidates
- Detect drift and adapt
- Explore new configurations
"""
self._update(x, y)
def reset(self) -> None:
"""Reset to initial untrained state (called before each run)."""
self._init_state()
Step 3 -- Register in __init__.py
# samlb/framework/classification/__init__.py
from .my_method.model import MyStreamingAutoML
__all__ = [
"AutoStreamClassifier",
"AutoClass",
"EvolutionaryBaggingClassifier",
"OAMLClassifier",
"MyStreamingAutoML", # <-- add here
]
Step 4 -- Use available building blocks
SAMLB provides fast C++ algorithms and River's full ecosystem as building blocks:
# C++ algorithms (fast, River-compatible)
from samlb.framework.base import (
CppNaiveBayes,
CppPerceptron,
CppLogisticRegression,
CppHoeffdingTreeClassifier,
CppKNNClassifier,
CppSGTClassifier,
)
# River preprocessing & drift detection
from river.preprocessing import MinMaxScaler, StandardScaler
from river.feature_selection import VarianceThreshold
from river.drift import ADWIN
# Compose a pipeline using River's | operator
pipeline = MinMaxScaler() | CppHoeffdingTreeClassifier(grace_period=200)
pipeline.predict_one(x)
pipeline.learn_one(x, y)
Step 5 -- Run it in the benchmark
from samlb.benchmark import BenchmarkSuite
from samlb.framework.classification.my_method import MyStreamingAutoML
suite = BenchmarkSuite(
models={
"MyMethod": MyStreamingAutoML(seed=42),
},
datasets=["electricity", "covertype", "insects"],
task="classification",
n_runs=10,
)
suite.run()
suite.print_table()
Step 6 -- Add tests
# tests/test_my_method.py
from samlb.framework.classification.my_method import MyStreamingAutoML
from samlb.datasets import stream
def test_predict_and_learn():
model = MyStreamingAutoML(seed=42)
for x, y in stream("electricity", task="classification", max_samples=500):
pred = model.predict_one(x)
model.learn_one(x, y)
assert pred is not None
def test_reset():
model = MyStreamingAutoML(seed=42)
for x, y in stream("electricity", task="classification", max_samples=100):
model.learn_one(x, y)
model.reset()
# Should be back to untrained state
Adding a New Dataset
- Prepare your data as a NumPy NPZ file with this schema:
X--float32array of shape(n_samples, n_features)y--int32(classification) orfloat32(regression) array of shape(n_samples,)feature_names-- string array of shape(n_features,)target_name-- string scalar
- Place the
.npzfile insamlb/datasets/classification/orsamlb/datasets/regression/ - It will be automatically discovered by
list_datasets()andload()
PR Checklist
- Code passes
ruff check samlb/ - Tests pass with
pytest tests/ - New framework implements all 3 methods of
BaseStreamFramework - Include a brief description of the AutoML strategy
- Reference any papers if applicable
- Include benchmark results on at least 3 datasets
Citation
If you use SAMLB in your research, please cite:
@software{samlb2024,
title = {SAMLB: Streaming AutoML Benchmark},
author = {Verma, Nilesh and Bifet, Albert and Pfahringer, Bernhard and Bahri, Maroua},
year = {2026},
url = {https://github.com/TechyNilesh/samlb}
}
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file samlb-0.1.0.tar.gz.
File metadata
- Download URL: samlb-0.1.0.tar.gz
- Upload date:
- Size: 744.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41a5959ca72cfb19ab9f63ed4a10118764a03dbf92d1e23cd94da0090b6daa0a
|
|
| MD5 |
57cc23ca6186708db69e08bc674d45ff
|
|
| BLAKE2b-256 |
7100a4e09d888774b40c5d6f260ac2df9e2c691c6b25097d8a0d69cfc0774ed5
|
File details
Details for the file samlb-0.1.0-cp39-cp39-macosx_26_0_arm64.whl.
File metadata
- Download URL: samlb-0.1.0-cp39-cp39-macosx_26_0_arm64.whl
- Upload date:
- Size: 192.0 kB
- Tags: CPython 3.9, macOS 26.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8aedc012aa4bd78bde9dbd28f4d3a9cbd7cbab5822e1142876e3283e5c02c13b
|
|
| MD5 |
3d740eb571bb0c5ff2ab9eaaf3bb1579
|
|
| BLAKE2b-256 |
167271596026ac159556353107448052dbe7e443f483a433e4e831fdb04edd14
|