Skip to main content

Lint for ML training pipelines: catch silent bugs (leakage, drift, schema mismatch) before they ruin your model.

Project description

dash_mlguard

Lint for ML training pipelines. One import, one call, one PDF report — catch the silent bugs that ruin models in production before you ship them.

pip install dash-mlguard          # core (pandas + numpy)
pip install dash-mlguard[pdf]     # adds PDF report support (fpdf2)
import dash_mlguard

report = dash_mlguard.check(X_train, y_train, X_test=X_test, y_test=y_test)
print(report)

if not report.ok():
    raise SystemExit("Fix the critical issues before training.")

That's the whole API. Pandas DataFrames, NumPy arrays, dicts, and lists all work as inputs. dash_mlguard does not train any model — it's deterministic, runs in seconds, and depends only on pandas + numpy (PDF output is an optional extra).


Why this exists

Every ML pipeline has small mistakes that go unnoticed: a column derived from the label sneaks in, the test set was sampled before the split was made, two columns are byte-identical, the same user appears in train and test. Each one looks fine in code review and silently inflates your accuracy. Then production happens.

dash_mlguard catches those mistakes before they break your pipeline. It's a static-analysis layer for training data — the way eslint is for JavaScript.

It's deliberately scoped: only training-data and pipeline integrity. It doesn't train models, tune hyperparameters, or visualize distributions — pandas, sklearn, and ydata-profiling already do those things well.


What it catches

Code Severity What it catches
TL001 critical / warning Exact-duplicate rows leaking from train into test
TL002 warning Near-duplicate rows (numeric round-off contamination)
TL003 critical / warning / info Target leakage — feature ↔ label association, tiered (≥0.98 / ≥0.85 / ≥0.70)
TL004 warning Constant or near-constant features
TL005 warning Duplicate feature columns
TL006 warning Train/test distribution drift (KS for numeric, PSI for categorical)
TL007 critical / warning Severe class imbalance
TL008 warning Missingness rate differs between train and test
TL009 critical Schema mismatch (columns or dtypes differ)
TL010 warning ID-like features (cardinality ≈ row count)
TL011 critical / warning Temporal leakage — test rows at or before the latest train timestamp
TL012 critical / warning Group leakage — same group ID (user / session / patient) in train and test
TL013 critical Preprocessing leakage — pipeline state depends on data outside the train split
TL014 warning Target-aware encoder without cross-validation wrapping

Each finding tells you the affected column(s), the severity, and how to fix it — not just that something is wrong.


Why it actually helps

The big-deal bugs in production ML aren't algorithm bugs. They're data hygiene bugs that pass code review:

  • A feature derived from the label sneaks in. The model gets 99% accuracy. Production gets 60%.
  • The same user's rows end up in train and test. Cross-validation looks great. Production looks terrible.
  • A timestamp column is fed in as a feature. The model overfits to row identity.
  • The test set was shuffled across time. Your "evaluation" is measuring transfer, not skill.
  • StandardScaler.fit_transform(X) was called before the train/test split. Test statistics leaked into training.

dash_mlguard.check(...) is a single call that catches these before training, with concrete fixes.


Demo: with vs without dash_mlguard

The repo ships examples/demo.py — a synthetic fraud-detection dataset (8 000 transactions, 600 users, 90-day window) with three mistakes baked into the naive pipeline:

  1. Shuffled split instead of chronological → temporal leakage
  2. Row-level split that puts the same users in train and test → group leakage
  3. StandardScaler.fit_transform(X) before splitting → preprocessing leakage

Run it:

cd examples
pip install -r requirements.txt
pip install dash-mlguard[pdf]
python demo.py

You get this verdict:

Metric Naive (3 bugs) Honest (dash_mlguard-cleaned) Inflation
accuracy 0.8717 0.8495 +0.0222
f1 0.6805 0.6569 +0.0236
roc_auc 0.9065 0.8959 +0.0106

The naive numbers look fine. They're not — they're the score of a model that's secretly cheating. dash_mlguard flags all three bugs as critical and refuses to ok() the run.

The demo also writes a single audit document — see examples/sample_report.pdf and examples/sample_report.html for what the output looks like.


Generate a PDF / HTML audit report

report = dash_mlguard.check(
    X_train, y_train, X_test, y_test,
    time_col="timestamp",        # enables TL011 (temporal leakage)
    group_key="user_id",         # enables TL012 (group leakage)
)

report.to_pdf(
    "audit.pdf",
    title="dash_mlguard audit -- fraud model v3",
    dataset_name="transactions Q1 2024",
    metrics_before={"accuracy": 0.8717, "f1": 0.6805, "roc_auc": 0.9065},
    metrics_after ={"accuracy": 0.8495, "f1": 0.6569, "roc_auc": 0.8959},
)

# Or, for embedding in a notebook / dashboard:
html = report.to_html(title="...", metrics_before=..., metrics_after=...)

The report contains: pass/fail banner, summary cards, performance comparison with deltas, every finding with what / detail / fix / columns — designed to print or share with a stakeholder.


Audit a sklearn pipeline

dash_mlguard.check() looks at data. dash_mlguard.audit_pipeline() looks at code:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
import dash_mlguard

candidate = Pipeline([
    ("scale", StandardScaler()),
    ("clf",   GradientBoostingClassifier(random_state=42)),
])

report = dash_mlguard.audit_pipeline(candidate, X, y)   # raw, unsplit X, y
print(report)

It clones the pipeline twice, fits one on the train split and one on the full dataset, and compares transform(X_test) outputs. If they diverge, the pipeline has data-dependent state (scaler stats, imputer means, encoder maps) that would leak when fit on full data — flagged as TL013 critical.

It also flags target-aware encoders (TargetEncoder, CatBoostEncoder, etc.) as TL014 if they appear without explicit CV wrapping.


API reference

dash_mlguard.check(
    X_train, y_train,
    X_test=None, y_test=None,
    *,
    task="auto",                      # "auto" | "classification" | "regression"
    time_col=None,                    # column name in X_train/X_test for TL011
    group_key=None,                   # column name OR Series for TL012
    group_key_test=None,              # defaults to group_key when it's a string
) -> Report

dash_mlguard.audit_pipeline(
    pipeline, X, y,
    *,
    task="auto",
    test_size=0.30,
    random_state=42,
    atol=1e-6,
) -> Report

Report:

  • report.ok()True if no critical findings.
  • report.findings, report.critical, report.warnings, report.infos — lists of Finding.
  • print(report) — human-readable terminal summary.
  • report.to_dict() — JSON-serializable dict (good for CI logs / artifacts).
  • report.to_html(...) — single-page self-contained HTML.
  • report.to_pdf(path, ...) — single audit document. Requires dash_mlguard[pdf].

Each Finding has: code, severity (critical / warning / info), message, fix, columns, details.


Use it in CI

import dash_mlguard, sys

report = dash_mlguard.check(X_train, y_train, X_test, y_test,
                     time_col="timestamp", group_key="user_id")
report.to_pdf("audit.pdf", title="CI audit")   # optional artifact
sys.exit(0 if report.ok() else 1)

A failed report.ok() blocks the merge. The PDF / HTML can be uploaded as a CI artifact for review.


Scope, on purpose

dash_mlguard is only a linter for training-data and pipeline-integrity bugs. It doesn't:

  • train models (use sklearn / lightning / xgboost),
  • tune hyperparameters (use Optuna / Ray Tune),
  • track experiments (use MLflow / W&B),
  • profile data (use ydata-profiling / sweetviz),
  • explain predictions (use SHAP / lime).

Doing one thing well is the point. If dash_mlguard.check() returns clean, you can trust your pipeline isn't silently broken — and that's all it claims to do.


Development

git clone https://github.com/<your-username>/dash_mlguard
cd dash_mlguard
pip install -e ".[dev]"
pytest                        # 29 tests, ~3 seconds

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dash_mlguard-0.3.0.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dash_mlguard-0.3.0-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file dash_mlguard-0.3.0.tar.gz.

File metadata

  • Download URL: dash_mlguard-0.3.0.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dash_mlguard-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c82de1f46b45e597c4b98d26e8b360427fe937ff48533ac711ea139663f2a948
MD5 fe6473d9b9ac0b4786b9f700235e297d
BLAKE2b-256 f9a32c776da1f906d4a930b0cdc1303996291a6a11ce465dcb87e22e5051f045

See more details on using hashes here.

File details

Details for the file dash_mlguard-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dash_mlguard-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dash_mlguard-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5e0b0ab35fb0d5051cd83c0d9d5c6b9043f99bb55f0eb3411f897eab8bbeb89
MD5 ef7daa707563cbca7ac587df7dee9c88
BLAKE2b-256 fe93ed7bf8e0453fcb5feede3f88c2c614aed8d1a44b94871781fdab247268bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page