Skip to main content

ML that works. One import, the Hastie workflow in code.

Project description

mlw ml hex logo

A grammar of machine learning workflows for Python. Four verbs prevent data leakage by construction. 16 algorithms, 11 Rust-native.

PyPI Python CI MIT epagogy.ai

Paper · R · Rust engine · epagogy.ai

Install

pip install mlw                       # core (11 Rust-native algorithms)
pip install "mlw[xgboost]"            # + XGBoost
pip install "mlw[all]"                # everything

Python 3.10+. Also available: lightgbm, catboost, plots, optuna, dev.

Quickstart

import ml

data = ml.dataset("churn")
s = ml.split(data, "churn", seed=42)

lb = ml.screen(s, "churn", seed=42)          # rank all algorithms
model = ml.fit(s.train, "churn", seed=42)
ml.evaluate(model, s.valid)                   # iterate freely

final = ml.fit(s.dev, "churn", seed=42)       # retrain on train+valid
ml.assess(final, test=s.test)                 # once — second call errors

Why ml

The evaluate/assess boundary. evaluate runs on validation data — call it as often as you like. assess runs on held-out test data and locks after one use. No discipline required; the API makes leakage inexpressible. This encodes the protocol from Hastie, Tibshirani & Friedman (ESL, Ch. 7).

Three-way split with .dev. Train (60%), valid (20%), test (20%). s.dev = train + valid combined for the final refit before assessment.

47 verbs, one import. From check_data and split through tune, stack, explain, drift, and shelf. Everything returns plain objects you can inspect, compare, or serialize.

168 datasets. tips and flights are bundled. The rest download from OpenML on first use and cache locally.

Highlights

Tune. Random, Bayesian (mlw[optuna]), or grid search.

tuned = ml.tune(s.train, "churn", algorithm="xgboost", seed=42, n_trials=50)
ml.evaluate(tuned, s.valid)

Ship gate. Hard pass/fail contracts before deployment.

ml.validate(final, test=s.test, rules={"accuracy": ">0.85"})

Drift. Catch distribution shift before users notice.

ml.drift(reference=s.train, new=live_data).shifted

Algorithms

16 families. engine="auto" picks Rust when available. engine="sklearn" forces scikit-learn fallback.

Algorithm String Engine Clf Reg
Random Forest "random_forest" Rust Y Y
Extra Trees "extra_trees" Rust Y Y
Gradient Boosting "gradient_boosting" Rust Y Y
Hist. Gradient Boosting "histgradient" Rust Y Y
Decision Tree "decision_tree" Rust Y Y
Ridge "linear" Rust · Y
Logistic "logistic" Rust Y ·
Elastic Net "elastic_net" Rust · Y
KNN "knn" Rust Y Y
Naive Bayes "naive_bayes" Rust Y ·
AdaBoost "adaboost" Rust Y ·
SVM "svm" Rust Y Y
XGBoost "xgboost" optional Y Y
LightGBM "lightgbm" optional Y Y
CatBoost "catboost" optional Y Y
TabPFN "tabpfn" optional Y ·

Citation

Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.18905073

License

MIT. Simon Roth, 2026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlw-1.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlw-1.1.0-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file mlw-1.1.0.tar.gz.

File metadata

  • Download URL: mlw-1.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlw-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ce1aebd26672193d09268065550d15cc79f124dc595458aa48a78e8751f7255f
MD5 ac91a2bec38d0aba7e8b032a76ce36ef
BLAKE2b-256 dbfd970bec4fa0c4a915ef1997ade6ee120640d824f47e1bd7409d2281fbd51f

See more details on using hashes here.

File details

Details for the file mlw-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlw-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlw-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f89d848dc72d17cd9dbb78bac29f895ab0d4bf1ef89026d866c12b4a2ad3107f
MD5 8b601531d6aa04cbf0ace5b2fd50843e
BLAKE2b-256 64704ed3a9b8b95d787520d3d1f8cd2a461b45df40a7679be8b2f27f4532692d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page