Skip to main content

ML that works. One import, the Hastie workflow in code.

Project description

mlw ml hex logo

A grammar of machine learning workflows for Python. Four verbs prevent data leakage by construction. 16 algorithms, 11 Rust-native.

PyPI Python CI MIT epagogy.ai

Paper · R · Rust engine · epagogy.ai

Install

pip install mlw                       # core (11 Rust-native algorithms)
pip install "mlw[xgboost]"            # + XGBoost
pip install "mlw[all]"                # everything

Python 3.10+. Also available: lightgbm, catboost, plots, optuna, dev.

Quickstart

import ml

data = ml.dataset("churn")
s = ml.split(data, "churn", seed=42)

lb = ml.screen(s, "churn", seed=42)          # rank all algorithms
model = ml.fit(s.train, "churn", seed=42)
ml.evaluate(model, s.valid)                   # iterate freely

final = ml.fit(s.dev, "churn", seed=42)       # retrain on train+valid
ml.assess(final, test=s.test)                 # once — second call errors

Why ml

The evaluate/assess boundary. evaluate runs on validation data — call it as often as you like. assess runs on held-out test data and locks after one use. No discipline required; the API makes leakage inexpressible. This encodes the protocol from Hastie, Tibshirani & Friedman (ESL, Ch. 7).

Three-way split with .dev. Train (60%), valid (20%), test (20%). s.dev = train + valid combined for the final refit before assessment.

47 verbs, one import. From check_data and split through tune, stack, explain, drift, and shelf. Everything returns plain objects you can inspect, compare, or serialize.

168 datasets. tips and flights are bundled. The rest download from OpenML on first use and cache locally.

Highlights

Tune. Random, Bayesian (mlw[optuna]), or grid search.

tuned = ml.tune(s.train, "churn", algorithm="xgboost", seed=42, n_trials=50)
ml.evaluate(tuned, s.valid)

Ship gate. Hard pass/fail contracts before deployment.

ml.validate(final, test=s.test, rules={"accuracy": ">0.85"})

Drift. Catch distribution shift before users notice.

ml.drift(reference=s.train, new=live_data).shifted

Algorithms

16 families. engine="auto" picks Rust when available. engine="sklearn" forces scikit-learn fallback.

Algorithm String Engine Clf Reg
Random Forest "random_forest" Rust Y Y
Extra Trees "extra_trees" Rust Y Y
Gradient Boosting "gradient_boosting" Rust Y Y
Hist. Gradient Boosting "histgradient" Rust Y Y
Decision Tree "decision_tree" Rust Y Y
Ridge "linear" Rust · Y
Logistic "logistic" Rust Y ·
Elastic Net "elastic_net" Rust · Y
KNN "knn" Rust Y Y
Naive Bayes "naive_bayes" Rust Y ·
AdaBoost "adaboost" Rust Y ·
SVM "svm" Rust Y Y
XGBoost "xgboost" optional Y Y
LightGBM "lightgbm" optional Y Y
CatBoost "catboost" optional Y Y
TabPFN "tabpfn" optional Y ·

Citation

Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.19023838

License

MIT. Simon Roth, 2026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlw-1.1.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlw-1.1.2-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file mlw-1.1.2.tar.gz.

File metadata

  • Download URL: mlw-1.1.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mlw-1.1.2.tar.gz
Algorithm Hash digest
SHA256 e167aeb03a46802999c7c996e24c71e7c3aa00d0acf7d8104788c6efd1fe0d70
MD5 c980eefa1ab7037fb697f052dce53756
BLAKE2b-256 822baee726f4a60203fb00854132d1691173f8a0fef0af96e670a6cdd70b06d4

See more details on using hashes here.

File details

Details for the file mlw-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: mlw-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mlw-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 372dda99540c9c4d40245bdc585867d8ff57c4af59e63512db2b5e23814ae824
MD5 0f666a3de9bc6c80454bf6f8b2c94306
BLAKE2b-256 2b37441397aa0cc2690993fd06ceaba97213d3e7b240aadf3c8b9d39e5f6e2f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page