Skip to main content

General tabular AutoML library (classification + regression), production-grade.

Project description

honestml

PyPI CI Python License: MIT

Tabular AutoML where the leaderboard doesn't lie. Most AutoML frameworks ship the model with the best validation score — but that number is optimistic, because you selected for it. honestml is built so that the score you see is the score you can expect in production.

It covers binary / multiclass classification and regression behind a clean, extensible core. The honesty is in how it selects: out-of-fold scoring on a shared CV split; a bootstrap equivalence band that, among the statistically indistinguishable best candidates, ships the simplest one; leakage-controlled feature engineering and selection; an optional untouched outer holdout scored exactly once; and reproducible, fingerprinted runs.

from honestml import AutoML

model = AutoML(task="binary").fit(X, y)
proba = model.predict_proba(X_new)
print(model.best_model_id_, model.leaderboard_)

The library is silent by default (a NullHandler on the honestml logger); enable progress with logging.getLogger("honestml").setLevel(logging.INFO) plus logging.basicConfig().

Install

pip install honestml                 # lightweight core (baseline/linear models)
pip install "honestml[boosting]"     # core + catboost, lightgbm, xgboost
pip install "honestml[all]"          # boosting + optuna (HPO), mlflow (tracking), onnx, shap, report and the rest
pip install "honestml[inference]"    # slim serving runtime (load_artifact + predict only)

Requires Python >= 3.10. Heavy dependencies are optional extras and imported lazily — import honestml stays light, and a missing extra fails fast with the exact pip install honestml[...] hint.

What you get

Capability How
Honest model selection OOF scoring on a shared CV split; a seeded bootstrap equivalence band (significance="bootstrap", the default) collects candidates statistically indistinguishable from the best, and the simplest member of the band wins — ties are disclosed, not hidden
CV schemes stratified / kfold / group / holdout / timeseries (purge+embargo, value-based time order) / timeseries_period (calendar or Δt period folds, wall-clock gaps, optional per-period weighting, rolling train window) — fit(..., time=, label_time=, groups=)
Outer holdout + finalize cv=CVConfig(outer_holdout=0.2): selection sees only DEV, the holdout is scored once; the shipped model is refit on all data after scoring (finalize=True)
Presets AutoML(preset="fast" / "balanced" / "best") — declarative, data-driven partial configs; an explicit argument always wins; honesty parameters are not presettable
Budget + resume budget=600 (seconds) or BudgetConfig(...) with graceful degradation; cache="runs/" resumes by run fingerprint
Feature engineering / selection OOF-honest target (binary-only) / frequency encoding, datetime deltas, intersections; importance / null-importance / random-probe / sequential / SHAP selection with honest arbitration
HPO + ensembling hpo=HPOConfig(...) (Optuna, per-model search before the honest selection); ensemble=EnsembleConfig() — a Caruana/weighted blend ships only if significantly better
Run report model.run_report_ (versioned JSON, tracker-independent); save_run_report and render_report produce markdown or self-contained HTML (charts via the report extra)
Experiment tracking tracker="mlflow" or TrackerConfig(...) — post-fit, fail-soft, no global mlflow state; custom backends via the ExperimentTracker port
Artifacts + serving save_artifact / load_artifact — versioned, integrity-checked artifact directory (see Standalone inference below)
ONNX export honestml.export_onnx(model, dir, sample=...) — parity-gated, export-only bundle for external runtimes
Plugins third-party models via honestml.models entry points (docs/plugin-contract.md)

Standalone inference

from honestml import load_artifact

model = load_artifact("artifact_dir/")   # integrity-checked against the sha256 manifest
predictions = model.predict(X_new)

The artifact directory is self-contained — manifest, preprocessing schema and the model body — and loads under the slim honestml[inference] install: no training stack is imported. Boosting models can be saved with structural native bodies (model_format="native"). Trust model: the default body is joblib/pickle — load only artifacts you trust; native bodies contain no pickle (a non-boosting estimator and the optional calibrator still ship as joblib).

Reproducibility

Every run computes a fingerprint over the resolved config, data signature, estimator set and library versions; the run report carries it together with the full provenance (leaderboard, band, budget outcome, FS/HPO/ensemble decisions, timings). Same inputs → same selection.

Documentation

Documentation lives in docs/ — quickstart, API reference, correctness guide and the plugin contract; build it locally with mkdocs serve. Source and issue tracker: https://github.com/sukhov-is/HonestML.

Development

uv sync --extra dev --extra boosting --extra shap --extra pyarrow --extra mlflow
uv run pytest                 # full suite (onnx export tests also need `--extra onnx`, Python >=3.11)
uv run ruff check src tests; uv run mypy src/honestml; uv run lint-imports

The layered architecture (core ← adapters ← application ← composition) is enforced by import-linter. See docs/releasing.md for the release pipeline and benchmarks/ for the honesty benchmark suite.

License

MIT (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

honestml-1.0.0.tar.gz (188.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

honestml-1.0.0-py3-none-any.whl (221.8 kB view details)

Uploaded Python 3

File details

Details for the file honestml-1.0.0.tar.gz.

File metadata

  • Download URL: honestml-1.0.0.tar.gz
  • Upload date:
  • Size: 188.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for honestml-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5765bc4d938dcf7fdf16cd8a913910f443019e073ba0c2093d837193b38d21bd
MD5 1b9e45f40e759d1e6e230dbbf8e7c2e3
BLAKE2b-256 49a8112f187eae02a5863f2d23b36602efcd6b9dcce0ad6be84bdac1bbd408a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for honestml-1.0.0.tar.gz:

Publisher: release.yml on sukhov-is/HonestML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file honestml-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: honestml-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 221.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for honestml-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e89b11dda2367775c8e7400bf92d0f39f86870d87d62f33e667cd61d69e7b1fc
MD5 88e4f5362c0f0b99bcd7e98a32091f74
BLAKE2b-256 fb9ffc8417ff605833727a6ed77dc90c228539e249e7ab3f2781854e4b50c12c

See more details on using hashes here.

Provenance

The following attestation bundles were made for honestml-1.0.0-py3-none-any.whl:

Publisher: release.yml on sukhov-is/HonestML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page