Static CLI linter for silent methodological errors in scikit-learn workflows

These details have not been verified by PyPI

Project description

MLGuard

A lightweight static analyzer for silent methodological errors in scikit-learn ML workflows — data leakage, invalid evaluation, and unsound train/test splits that run without raising an error but quietly inflate your results.

Zero runtime dependencies (Python standard library only). MLGuard is a heuristic surfacing tool, not a prover: every diagnostic carries a confidence (high / medium / low) and is meant as a code-review prompt, not a proof of a bug.

Install

pip install mlguard-lint

That's it — same command on Linux, macOS, and Windows. Requires Python 3.9+.

Use it

Point it at a single file or a whole folder:

mlguard-lint notebook.ipynb        # scan one notebook or .py file
mlguard-lint src/                  # scan a directory (recursive)

By default you get one clean line per issue:

mlguard-lint — notebook.ipynb

notebook.ipynb
  ✗ line 5    MLG001  Transformer fitted before split
  ⚠ line 6    MLG006  Missing random_state
  ⚠ line 6    MLG005  Classification split without stratify

3 issue(s) in 1 of 1 file(s) · 1 critical, 2 warnings
Tip: add --explain for why each matters and how to fix.

Options

mlguard-lint notebook.ipynb --explain    # add the code, why it matters, and how to fix it
mlguard-lint src/ --summary              # one line per file (handy for large folders)
mlguard-lint . --fail-on critical        # exit code 2 on any critical finding (CI gate)
mlguard-lint . --json out.json           # machine-readable output
mlguard-lint notebook.ipynb --no-color   # plain text (colors auto-off when piped)

If the mlguard-lint command isn't on your PATH, the module form always works:

python -m mlguard_lint notebook.ipynb

Windows: use Windows Terminal or PowerShell so the ✗ ⚠ symbols and colors render correctly. On the legacy cmd.exe console, pass --no-color (or run chcp 65001 once for UTF-8).

Rules

MLG001 — Transformer fitted before split
MLG002 — Preprocessing outside cross-validation
MLG003 — Model evaluated on training data
MLG004 — Resampling before split/CV or outside an imblearn Pipeline
MLG005 — Classification split without stratify
MLG006 — Missing random_state
MLG007 — Possible group/entity leakage
MLG008 — GridSearchCV best_score_ reported as final performance
MLG009 — Ordinal/label encoding of a nominal feature
MLG010 — Test set reused multiple times
MLG011 — Possible target/mean encoding without cross-fitting
MLG012 — No independent test set
MLG013 — Resampling applied to the test/validation set
MLG014 — Random split/CV on time-ordered data
MLG015 — Target column included in features
MLG016 — Transformer re-fit on the test set
MLG017 — Probability metric given hard predictions
MLG018 — Feature built from dataset-wide statistics before split
MLG019 — Misleading micro-average on imbalanced multiclass
MLG020 — ID/source column used as a feature
MLG021 — Rows duplicated/upsampled before split

How it works

The scanner concatenates a notebook's code cells into a single module, parses it with the standard ast module, performs one walk to collect calls, assignments, model fit arguments, and id-like string constants, then runs ordered rule blocks that emit diagnostics. Because the whole notebook is analyzed as one program (no per-cell execution semantics), cross-cell dataflow is approximate and line numbers are notebook-global.

Limitation

This is a heuristic static analyzer. It is useful for surfacing risks, not for proving that every warning is a real bug. Treat diagnostics as prompts for a closer look during code review.

Development

git clone <repo> && cd mlguard
pip install -e ".[dev]"          # editable install + pytest/build/twine
python -m pytest tests/test_rules.py -q

Each rule has a synthetic fixture under tests/notebooks/ plus clean controls that must stay silent.

License

MIT — see LICENSE.

The methodology behind the rules is documented in docs/Silent-Methodological-Errors-in-scikit-learn-Workflows.pdf.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlguard_lint-0.1.1.tar.gz (19.2 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlguard_lint-0.1.1-py3-none-any.whl (17.8 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file mlguard_lint-0.1.1.tar.gz.

File metadata

Download URL: mlguard_lint-0.1.1.tar.gz
Upload date: Jun 21, 2026
Size: 19.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlguard_lint-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`cb348f7f0644a6a0be842bcfcddbe6d521f5954472e9f809053e9c385eb50e58`
MD5	`e4e7dba1d925bd5f3a58c19a499ce4bb`
BLAKE2b-256	`42eb7ec27037139c5f8d662decaf8df7a24957c01959bc5c39212dc95040d4cd`

See more details on using hashes here.

File details

Details for the file mlguard_lint-0.1.1-py3-none-any.whl.

File metadata

Download URL: mlguard_lint-0.1.1-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 17.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlguard_lint-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`55524ebed6f674763aacd7de7582edba33004c5a3591492d5629d94c6be5e290`
MD5	`8028094b2c6d61cd891a2b27fb4ae04c`
BLAKE2b-256	`d1cd29e9b65414373e92d33f3fe4c928093c214c0499d1b8a9a282750f5de6fa`

See more details on using hashes here.

mlguard-lint 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MLGuard

Install

Use it

Options

Rules

How it works

Limitation

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes