Config-driven ML analysis library for regression and classification
Project description
LizyML
Config-driven ML library that unifies tune / fit / predict / evaluate / export for regression, binary classification, and multiclass classification.
Key Features
- One config, full pipeline -- A single dict/YAML/JSON drives splitting, training, tuning, evaluation, and export. No boilerplate orchestration code.
- Reproducibility by default -- Seed, split indices, params, library versions, and data fingerprint are captured automatically in every run.
- Leakage-aware CV and calibration -- OOF predictions never see their own training rows. Calibration uses cross-fit on the same outer splits. Time and group constraints propagate to inner validation.
- 8 CV strategies -- KFold, Stratified, Group, StratifiedGroup, TimeSeries, Purged TimeSeries, Group TimeSeries, and 2-axis Blocked Group KFold.
- Stable result contracts --
FitResult,PredictionResult, and artifact formats have fixed schemas. Downstream code never breaks on shape changes. - Codegen export -- Generate standalone
train.py+predict.pythat run without LizyML installed. - Optional extras -- Tuning (Optuna), SHAP explanations, Plotly visualizations, and Beta calibration (scipy) are all opt-in.
Installation
pip install lizyml
Extras
pip install 'lizyml[tuning]' # Optuna hyperparameter search
pip install 'lizyml[explain]' # SHAP explanations
pip install 'lizyml[plots]' # Plotly visualizations
pip install 'lizyml[calibration]' # Beta calibrator (scipy)
pip install 'lizyml[tuning,explain,plots,calibration]' # all extras
Development install
git clone https://github.com/nbx-liz/LizyML.git
cd LizyML
uv sync --group dev
Quick Start
import numpy as np
import pandas as pd
from lizyml import Model
# --- Synthetic data ---
rng = np.random.default_rng(42)
n = 500
df = pd.DataFrame({
"feat_a": rng.normal(size=n),
"feat_b": rng.normal(size=n),
"cat_col": rng.choice(["x", "y", "z"], size=n),
"target": rng.normal(size=n),
})
# --- Config ---
config = {
"config_version": 1,
"task": "regression",
"data": {"target": "target"},
"features": {"categorical": ["cat_col"]},
"split": {"method": "kfold", "n_splits": 5},
"model": {"name": "lgbm"},
"evaluation": {"metrics": ["rmse", "mae"]},
}
# --- Train, evaluate, predict ---
model = Model(config=config)
fit_result = model.fit(data=df)
metrics = model.evaluate()
print(metrics) # {"raw": {"oof": {"rmse": ..., "mae": ...}, ...}}
pred = model.predict(df.drop(columns=["target"]))
print(pred.pred[:5])
# --- Save and reload ---
model.export("my_model")
loaded = Model.load("my_model")
loaded.predict(df.drop(columns=["target"]))
Configuration
LizyML accepts configs as Python dicts, JSON files, or YAML files. Environment variables override any key using the LIZYML__ prefix (e.g., LIZYML__training__seed=99).
See Config Reference for all keys, defaults, split method guides, and tuning space definitions.
Codegen Export
Generate LizyML-free scripts for production deployment:
model.export_code("deploy/my_model")
Output:
train.py-- retrain on new data withpython train.py data.csvpredict.py-- run inference withpython predict.py test.csv -o out.csvconfig.json-- all hyperparameters and feature definitionstest_equivalence.py-- verify codegen matchesModel.predict()artifacts/-- model files in human-readable formats
Dependencies: only lightgbm, numpy, pandas, scikit-learn.
Architecture
LizyML uses a 5-layer architecture where dependencies flow strictly downward:
Layer 4 Facade Model (orchestration only, no logic)
|
Layer 3 Optional explain / plots / persistence / codegen
|
Layer 2 Composition training / evaluation / tuning
|
Layer 1 Leaf config / data / splitters / features / estimators / metrics / calibration
|
Layer 0 Foundation types (FitResult, PredictionResult, ...) / exceptions / logging
Key rules:
- Downward-only dependencies (no circular imports)
- Layer 2 references Layer 1 through abstract interfaces only
- Only the Facade (Layer 4) assembles concrete classes
See ARCHITECTURE.md for full diagrams and module layout.
Design Priorities
Reproducibility -- Same config + seed = same splits, same OOF predictions, same metrics. Every run captures seed, split indices, params, library versions, and a data fingerprint.
Leakage prevention -- OOF rows are never seen during training. Calibration cross-fit reuses outer CV splits. Time and group constraints propagate to inner validation (early stopping) and calibration.
Contract stability -- FitResult, PredictionResult, and artifact formats have fixed schemas. Breaking changes require a format_version bump and migration path.
Result Objects
| Object | Key fields |
|---|---|
FitResult |
oof_pred, if_pred_per_fold, metrics, models, splits, run_meta |
PredictionResult |
pred, proba (binary), shap_values (optional), warnings |
Model Artifact |
Trained models, pipeline state, calibrator, config, format_version |
model.evaluate() returns structured metrics:
{
"raw": {
"oof": {"rmse": 0.42, "mae": 0.33},
"if_mean": {"rmse": 0.40, "mae": 0.31},
"if_per_fold": [...],
"oof_coverage": 1.0,
},
"calibrated": {"oof": {"logloss": 0.35}}, # binary only
}
See BLUEPRINT.md for full schemas and invariants.
Roadmap
- Broader scikit-learn estimator support
- DNN backend (PyTorch)
- Multiclass calibration
- Ranking tasks
- Additional export formats (ONNX, TorchScript)
Documentation
- Config Reference -- all config keys, defaults, and split guides
- BLUEPRINT.md -- implementation specification (source of truth)
- ARCHITECTURE.md -- 5-layer architecture diagrams
- CHANGELOG.md -- release history
- HISTORY.md -- proposal and decision records
Contributing
- Fork the repo and create a branch from
develop - Run quality gates:
uv run ruff check . && uv run mypy lizyml/ && uv run pytest - Open a PR against
develop
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lizyml-0.7.3.tar.gz.
File metadata
- Download URL: lizyml-0.7.3.tar.gz
- Upload date:
- Size: 576.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f588bdbdea926ee4ba8d129cd6ca6d91a728658caf04deef2d5e4a192900e2a2
|
|
| MD5 |
d6c5e4e67d53fc7713b4b4645be485d0
|
|
| BLAKE2b-256 |
c7ad18639330a8ec745f05317e52246c1e4f88dd5a6847f93dee02822a0a7ee6
|
Provenance
The following attestation bundles were made for lizyml-0.7.3.tar.gz:
Publisher:
release.yml on nbx-liz/LizyML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lizyml-0.7.3.tar.gz -
Subject digest:
f588bdbdea926ee4ba8d129cd6ca6d91a728658caf04deef2d5e4a192900e2a2 - Sigstore transparency entry: 1210454982
- Sigstore integration time:
-
Permalink:
nbx-liz/LizyML@c16b80370ff71b0635f2aa8dae502e5b126b9c3d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/nbx-liz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c16b80370ff71b0635f2aa8dae502e5b126b9c3d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file lizyml-0.7.3-py3-none-any.whl.
File metadata
- Download URL: lizyml-0.7.3-py3-none-any.whl
- Upload date:
- Size: 143.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf2fc0fec855384adc9c9c444ca898babc7cb0f72b640a46c4611d60e2aef061
|
|
| MD5 |
10d6a1c5856d91b2ea4638ba263ab122
|
|
| BLAKE2b-256 |
60685f5b2f935632a0ccc58acbe2826677a8a71a8a203ff086f0954ff5b1adef
|
Provenance
The following attestation bundles were made for lizyml-0.7.3-py3-none-any.whl:
Publisher:
release.yml on nbx-liz/LizyML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lizyml-0.7.3-py3-none-any.whl -
Subject digest:
cf2fc0fec855384adc9c9c444ca898babc7cb0f72b640a46c4611d60e2aef061 - Sigstore transparency entry: 1210455010
- Sigstore integration time:
-
Permalink:
nbx-liz/LizyML@c16b80370ff71b0635f2aa8dae502e5b126b9c3d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/nbx-liz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c16b80370ff71b0635f2aa8dae502e5b126b9c3d -
Trigger Event:
workflow_dispatch
-
Statement type: