ML that works. One import, the Hastie workflow in code.
Project description
mlw 
A grammar of machine learning workflows for Python. Four verbs prevent data leakage by construction. 16 algorithms, 11 Rust-native.
Paper · R · Rust engine · epagogy.ai
Install
pip install mlw # core (11 Rust-native algorithms)
pip install "mlw[xgboost]" # + XGBoost
pip install "mlw[all]" # everything
Python 3.10+. Also available: lightgbm, catboost, plots, optuna, dev.
Quickstart
import ml
data = ml.dataset("churn")
s = ml.split(data, "churn", seed=42)
lb = ml.screen(s, "churn", seed=42) # rank all algorithms
model = ml.fit(s.train, "churn", seed=42)
ml.evaluate(model, s.valid) # iterate freely
final = ml.fit(s.dev, "churn", seed=42) # retrain on train+valid
ml.assess(final, test=s.test) # once — second call errors
Why ml
The evaluate/assess boundary. evaluate runs on validation data —
call it as often as you like. assess runs on held-out test data and
locks after one use. No discipline required; the API makes leakage
inexpressible. This encodes the protocol from Hastie, Tibshirani &
Friedman (ESL, Ch. 7).
Three-way split with .dev. Train (60%), valid (20%), test (20%).
s.dev = train + valid combined for the final refit before assessment.
47 verbs, one import. From check_data and split through tune,
stack, explain, drift, and shelf. Everything returns plain
objects you can inspect, compare, or serialize.
168 datasets. tips and flights are bundled. The rest download
from OpenML on first use and cache locally.
Highlights
Tune. Random, Bayesian (mlw[optuna]), or grid search.
tuned = ml.tune(s.train, "churn", algorithm="xgboost", seed=42, n_trials=50)
ml.evaluate(tuned, s.valid)
Ship gate. Hard pass/fail contracts before deployment.
ml.validate(final, test=s.test, rules={"accuracy": ">0.85"})
Drift. Catch distribution shift before users notice.
ml.drift(reference=s.train, new=live_data).shifted
Algorithms
16 families. engine="auto" picks Rust when available. engine="sklearn"
forces scikit-learn fallback.
| Algorithm | String | Engine | Clf | Reg |
|---|---|---|---|---|
| Random Forest | "random_forest" |
Rust | Y | Y |
| Extra Trees | "extra_trees" |
Rust | Y | Y |
| Gradient Boosting | "gradient_boosting" |
Rust | Y | Y |
| Hist. Gradient Boosting | "histgradient" |
Rust | Y | Y |
| Decision Tree | "decision_tree" |
Rust | Y | Y |
| Ridge | "linear" |
Rust | · | Y |
| Logistic | "logistic" |
Rust | Y | · |
| Elastic Net | "elastic_net" |
Rust | · | Y |
| KNN | "knn" |
Rust | Y | Y |
| Naive Bayes | "naive_bayes" |
Rust | Y | · |
| AdaBoost | "adaboost" |
Rust | Y | · |
| SVM | "svm" |
Rust | Y | Y |
| XGBoost | "xgboost" |
optional | Y | Y |
| LightGBM | "lightgbm" |
optional | Y | Y |
| CatBoost | "catboost" |
optional | Y | Y |
| TabPFN | "tabpfn" |
optional | Y | · |
Citation
Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.19023838
License
MIT. Simon Roth, 2026.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlw-1.1.2.tar.gz.
File metadata
- Download URL: mlw-1.1.2.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e167aeb03a46802999c7c996e24c71e7c3aa00d0acf7d8104788c6efd1fe0d70
|
|
| MD5 |
c980eefa1ab7037fb697f052dce53756
|
|
| BLAKE2b-256 |
822baee726f4a60203fb00854132d1691173f8a0fef0af96e670a6cdd70b06d4
|
File details
Details for the file mlw-1.1.2-py3-none-any.whl.
File metadata
- Download URL: mlw-1.1.2-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
372dda99540c9c4d40245bdc585867d8ff57c4af59e63512db2b5e23814ae824
|
|
| MD5 |
0f666a3de9bc6c80454bf6f8b2c94306
|
|
| BLAKE2b-256 |
2b37441397aa0cc2690993fd06ceaba97213d3e7b240aadf3c8b9d39e5f6e2f4
|