Skip to main content

ML that works. One import, the Hastie workflow in code.

Project description

mlw ml hex logo

A grammar of machine learning workflows for Python. Four verbs prevent data leakage by construction. 16 algorithms, 11 Rust-native.

PyPI Python CI MIT epagogy.ai

Paper · R · Rust engine · epagogy.ai

Install

pip install mlw                       # core (11 Rust-native algorithms)
pip install "mlw[xgboost]"            # + XGBoost
pip install "mlw[all]"                # everything

Python 3.10+. Also available: lightgbm, catboost, plots, optuna, dev.

Quickstart

import ml

data = ml.dataset("churn")
s = ml.split(data, "churn", seed=42)

lb = ml.screen(s, "churn", seed=42)          # rank all algorithms
model = ml.fit(s.train, "churn", seed=42)
ml.evaluate(model, s.valid)                   # iterate freely

final = ml.fit(s.dev, "churn", seed=42)       # retrain on train+valid
ml.assess(final, test=s.test)                 # once — second call errors

Why ml

The evaluate/assess boundary. evaluate runs on validation data — call it as often as you like. assess runs on held-out test data and locks after one use. No discipline required; the API makes leakage inexpressible. This encodes the protocol from Hastie, Tibshirani & Friedman (ESL, Ch. 7).

Three-way split with .dev. Train (60%), valid (20%), test (20%). s.dev = train + valid combined for the final refit before assessment.

47 verbs, one import. From check_data and split through tune, stack, explain, drift, and shelf. Everything returns plain objects you can inspect, compare, or serialize.

168 datasets. tips and flights are bundled. The rest download from OpenML on first use and cache locally.

Highlights

Tune. Random, Bayesian (mlw[optuna]), or grid search.

tuned = ml.tune(s.train, "churn", algorithm="xgboost", seed=42, n_trials=50)
ml.evaluate(tuned, s.valid)

Ship gate. Hard pass/fail contracts before deployment.

ml.validate(final, test=s.test, rules={"accuracy": ">0.85"})

Drift. Catch distribution shift before users notice.

ml.drift(reference=s.train, new=live_data).shifted

Algorithms

16 families. engine="auto" picks Rust when available. engine="sklearn" forces scikit-learn fallback.

Algorithm String Engine Clf Reg
Random Forest "random_forest" Rust Y Y
Extra Trees "extra_trees" Rust Y Y
Gradient Boosting "gradient_boosting" Rust Y Y
Hist. Gradient Boosting "histgradient" Rust Y Y
Decision Tree "decision_tree" Rust Y Y
Ridge "linear" Rust · Y
Logistic "logistic" Rust Y ·
Elastic Net "elastic_net" Rust · Y
KNN "knn" Rust Y Y
Naive Bayes "naive_bayes" Rust Y ·
AdaBoost "adaboost" Rust Y ·
SVM "svm" Rust Y Y
XGBoost "xgboost" optional Y Y
LightGBM "lightgbm" optional Y Y
CatBoost "catboost" optional Y Y
TabPFN "tabpfn" optional Y ·

Citation

Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.18905073

License

MIT. Simon Roth, 2026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlw-1.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlw-1.1.1-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file mlw-1.1.1.tar.gz.

File metadata

  • Download URL: mlw-1.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlw-1.1.1.tar.gz
Algorithm Hash digest
SHA256 b373c8fb2681570311c585c7f7fb83a8bfefc3e9b82ff6f3180c2059c9034c22
MD5 506de32c5bf1d5bda191a8eb332f5f1b
BLAKE2b-256 a7301d47f61e78fc4eed48676cf40a3cb72332e17414920e377f85c0d1f2795e

See more details on using hashes here.

File details

Details for the file mlw-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: mlw-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlw-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35dcf81fb7c3c249fd4deb85bd8273fda5fb19e1af6b425df2cab7a310f222fa
MD5 b9c0840525a0e27ce6ff031351f64477
BLAKE2b-256 85efe8736dc73107ed160f03bfaa2903ea3de7a6bf8c497a8eee58aaf298be94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page