Security and compliance scanner for ML pipelines

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flint55

These details have not been verified by PyPI

Project description

ML Guard

Security & compliance scanner for ML pipelines — docker scan for the ML world.

ML Guard scans the artifacts your team ships — model weights, configs, dependency manifests, notebooks — and flags problems before they reach production: malicious pickle code, embedded executables in safetensors, ONNX models with custom plugins, leaked API keys, vulnerable PyPI dependencies, malicious packages.

It runs offline. It produces SARIF for native GitHub Code Scanning, CycloneDX SBOMs for audit, and PDF compliance reports for EU AI Act, NIST AI RMF, ISO 27001, and SOC 2.

Status

v0.1.0 — first public release. All five scanners and the compliance reporter are production-ready; 152 tests cover the codepaths.

Scanner	Status	What it catches
`pickle`	✓ shipped	RCE globals, suspicious modules, PyTorch ZIP, proto≥4
`safetensors`	✓ shipped	trailing payloads, malformed offsets, embedded URIs
`onnx`	✓ shipped	custom domain ops, suspicious external_data, shells
`secrets`	✓ shipped	AWS/GitHub/OpenAI keys, JWTs, PEM keys, generic entropy
`cve`	✓ shipped	OSV cross-check of `requirements.txt` (offline DB)

Install

# Pure-Python, works everywhere; ~640 KB wheel including bundled OSV DB.
pip install mlsupplychain

The wheel ships with a curated mini OSV database covering ~150 popular ML packages, so pip install mlsupplychain && ml-guard scan finds real vulnerabilities out of the box — no setup. For full CVE coverage across all PyPI:

wget https://osv-vulnerabilities.storage.googleapis.com/PyPI/all.zip
ml-guard cve-update all.zip

Note on naming: the package on PyPI is mlsupplychain (because mlguard was already taken by an unrelated project). The CLI command is still ml-guard for everyday use. Think of it like pip install scikit-learn giving you import sklearn.

Quick start

ml-guard scan ./my-project

ML Guard — scan report
========================================
Files scanned: 5    Time: 0.04s
Summary:       6 critical, 12 high, 21 medium, 3 low

✗ CRITICAL  model.pkl  [offset 0x2a1]
            Dangerous global imported: os.system (known RCE primitive)
✗ CRITICAL  requirements.txt  [package ascii2text==1.0]
            Malicious package detected (advisory MAL-2022-7421).
✗ CRITICAL  requirements.txt  [package transformers==4.30.0]
            CVE-2023-6730: Deserialization of Untrusted Data vulnerability
! HIGH      .env  [line 1]
            GitHub Personal Access Token detected
            snippet: ghp_…6789 (len=40)
...

Exit code is 1 if any finding meets --fail-on (default: critical).

CI integration

- uses: ml-guard/scan-action@v1
  with:
    path: ./models
    fail-on: critical
    format: sarif
    output: ml-guard.sarif
- uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: ml-guard.sarif

The SARIF report appears in Security → Code scanning in your repo.

Compliance reports

ML Guard produces machine-readable evidence for four standards:

Standard	ID	What we cover
EU AI Act	`eu-ai-act`	Articles 9, 10, 11, 12, 13, 15 — risk management,
		record-keeping, technical documentation, cybersecurity
NIST AI RMF	`nist-ai-rmf`	MEASURE 2.7, 2.10; MANAGE 4.1
ISO/IEC 27001	`iso-27001`	Annex A: 5.23, 5.34, 8.4, 8.7, 8.8, 8.25, 8.28
SOC 2	`soc2`	Common Criteria: CC6.1, 6.6, 6.7, 6.8, 7.1, 7.2

Generate a PDF for an audit:

ml-guard compliance ./models --standard iso-27001 --output report.pdf

The PDF includes verdict, control-by-control evidence with file/line references, full findings appendix, and an integrity SHA-256.

Important caveat for auditors: these reports are machine-readable technical evidence, not conformity declarations. Determination of regulatory compliance requires assessment by a qualified person (notified body, DPO, CPA firm).

SBOM

ml-guard sbom ./models -o ml-bom.json

Produces a CycloneDX 1.5 JSON with every artifact (SHA-256 hashed), dependency manifest entries, and findings encoded as vulnerabilities with proper bom-ref links. Drops directly into Dependency-Track, DefectDojo, sbom-utility, and the like.

Configuration

Drop a .ml-guard.yml in your project root:

fail_on: high                 # CI-only override (default: critical)
include:
  - 'models/*.pkl'
  - 'configs/*.yaml'
exclude:
  - 'tests/fixtures/**'
scanners:
  - pickle
  - secrets
rules:
  pickle-unusual-module:
    severity: low             # downgrade
  secret-stripe-test:
    disabled: true            # silence entirely

CLI flags always override config; config provides defaults.

Output formats

Format	Flag	Use case
`text`	`--format text`	humans (default, colorized)
`json`	`--format json`	scripts, custom dashboards
`sarif`	`--format sarif`	GitHub Code Scanning, GitLab SAST, IDE plugins

Why pickle is the #1 priority

pickle.load() and torch.load() execute arbitrary Python code by design. A 200-byte .pkl file can drop a reverse shell when a data scientist opens it. ML Guard parses the pickle bytecode statically — never executing it — and flags every callable resolved before deserialization happens. See docs/pickle-threat-model.md for full attack surface.

Architecture

ml_guard/
├── findings.py              # Finding/Severity dataclasses
├── runner.py                # walks paths, dispatches scanners
├── cli.py                   # click entrypoint
├── config.py                # .ml-guard.yml loader
├── compliance.py            # EU AI Act / NIST AI RMF / ISO 27001 / SOC 2
├── sbom.py                  # CycloneDX 1.5 generator
├── cve_db.py                # SQLite OSV index
├── _pdf.py                  # in-tree PDF 1.4 writer (no reportlab dep)
├── _protobuf.py             # in-tree protobuf reader (no onnx dep)
├── data/
│   └── osv-mini.sqlite      # bundled mini OSV DB (~530 KB compressed)
├── scanners/
│   ├── pickle_scanner.py
│   ├── safetensors_scanner.py
│   ├── onnx_scanner.py
│   ├── secret_scanner.py
│   └── cve_scanner.py
└── output/
    ├── text.py
    ├── json_fmt.py
    └── sarif.py
rust_engine/                  # optional native acceleration via PyO3

The Rust engine is opt-in via pip install mlsupplychain[native]. Without it, every scanner runs on pure Python with the same correctness guarantees — just slower on multi-gigabyte artifacts.

Documentation

docs/rules.md — full catalog of rules, severities, and override examples.
docs/pickle-threat-model.md — what we cover and what we don't, with attack patterns explained.
docs/cve-database.md — OSV update workflow.
docs/performance.md — real benchmark numbers.
docs/releasing.md — for maintainers.

Contributing

See CONTRIBUTING.md. Security policy: SECURITY.md.

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flint55

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlsupplychain-0.1.0.tar.gz (695.2 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlsupplychain-0.1.0-py3-none-any.whl (655.4 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file mlsupplychain-0.1.0.tar.gz.

File metadata

Download URL: mlsupplychain-0.1.0.tar.gz
Upload date: May 10, 2026
Size: 695.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlsupplychain-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b2d637c1651cfd8d357b747635d257df5abc93b4ee16342f88d884dc13546575`
MD5	`985534ef5f44b9da7ea55eb77687fdee`
BLAKE2b-256	`8a60e857294e60c686b69f9e394bcc61c2996e08133de9d545fe4c0a05b62871`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlsupplychain-0.1.0.tar.gz:

Publisher: release.yml on ml-guard/ml-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlsupplychain-0.1.0.tar.gz
- Subject digest: b2d637c1651cfd8d357b747635d257df5abc93b4ee16342f88d884dc13546575
- Sigstore transparency entry: 1499311278
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: ml-guard/ml-guard@a711d80b00fa50811394b99dc6ba5bfd40825256
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ml-guard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a711d80b00fa50811394b99dc6ba5bfd40825256
- Trigger Event: push

File details

Details for the file mlsupplychain-0.1.0-py3-none-any.whl.

File metadata

Download URL: mlsupplychain-0.1.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 655.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlsupplychain-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0f50fa52c1424c77c2e07f2980a8e6903ec113d9e398f36a1ca0188f8ace3373`
MD5	`86bcab3531092e132a801c7cd3acd0d8`
BLAKE2b-256	`02eb36d7fda665a4186403bf5c37031dcb237e9d8e8e5fc7876151235ded03c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlsupplychain-0.1.0-py3-none-any.whl:

Publisher: release.yml on ml-guard/ml-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlsupplychain-0.1.0-py3-none-any.whl
- Subject digest: 0f50fa52c1424c77c2e07f2980a8e6903ec113d9e398f36a1ca0188f8ace3373
- Sigstore transparency entry: 1499311423
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: ml-guard/ml-guard@a711d80b00fa50811394b99dc6ba5bfd40825256
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ml-guard
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a711d80b00fa50811394b99dc6ba5bfd40825256
- Trigger Event: push

mlsupplychain 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ML Guard

Status

Install

Quick start

CI integration

Compliance reports

SBOM

Configuration

Output formats

Why pickle is the #1 priority

Architecture

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance