Standalone pickle security scanner extracted from ModelAudit
Project description
modelaudit-picklescan
Rust-backed, bounded, static pickle security scanner. Inspects Python pickle streams and PyTorch ZIP checkpoints without unpickling them, and returns a typed report you can feed into CI, SARIF exporters, or custom policy engines.
Why this package
Pickle deserialization is the most common supply-chain attack vector in ML checkpoints, and existing Python-only scanners either unpickle the payload (unsafe), scan string literals only (imprecise), or fail open on large/malformed inputs (dangerous in CI). This package is a direct response:
- Rust scanner engine. Opcode walker, string analyzer, and nested-payload decoder are all native code.
- Fail-closed semantics. Every scan returns both a
status(complete / inconclusive / error) and averdict(clean / suspicious / malicious / unknown). Truncation, timeouts, budget exhaustion, and parser errors downgrade the verdict instead of silently returning clean. - Bounded by construction. Opcode count, wall-clock timeout, string-literal bytes, nested-payload bytes, and recursion depth are all configurable caps with safe defaults. A malicious producer cannot force unbounded memory or CPU.
- Zero Python runtime dependencies. The wheel is self-contained —
pip install modelaudit-picklescanand nothing else. - Attested provenance. Release wheels are published to PyPI with sigstore attestations via GitHub Actions trusted publishing.
- Typed, immutable reports.
PickleReport,Finding,Notice, andScanErrorare frozen dataclasses withto_dict()for serialization. The package shipspy.typedfor mypy / pyright.
Install
pip install modelaudit-picklescan
Pre-built abi3 wheels ship for Python 3.10–3.13 on five targets: Linux x86_64, Linux aarch64, macOS arm64, macOS x86_64, and Windows x64. Other platforms install from the sdist and require a Rust toolchain (see Building from source).
Quickstart
from modelaudit_picklescan import scan_file
report = scan_file("suspicious_model.pt") # raw pickle or PyTorch ZIP checkpoint
print(f"status={report.status.value} verdict={report.verdict.value}")
for finding in report.findings:
print(f" [{finding.severity.value}] {finding.rule_code}: {finding.message}")
if finding.location:
print(f" at {finding.location}")
Example output on a PyTorch ZIP whose inner pickle reduces on os.system:
status=complete verdict=malicious
[critical] DANGEROUS_CALL: Found REDUCE opcode invoking os.system
at suspicious_model.pt:archive/data.pkl (pos 42)
Example output on a truncated or oversized pickle where analysis is incomplete:
status=inconclusive verdict=unknown
(no findings — scan was truncated, inspect report.notices and report.coverage)
The finding.location string follows the format {source} (pos {byte_offset}). The source on PyTorch ZIP members is {archive_path}:{member_name}.
What it detects
Each finding carries a rule_code so downstream tooling can allowlist, suppress, or route alerts:
| Rule code | What it flags |
|---|---|
DANGEROUS_CALL |
REDUCE/NEWOBJ/NEWOBJ_EX opcodes invoking a callable known to execute code |
DANGEROUS_GLOBAL |
Imports of modules or classes that enable code execution when the pickle is loaded |
EXTENSION_REF |
copyreg.extension / EXT1/EXT2/EXT4 opcodes that resolve through process state |
MALFORMED_STACK_GLOBAL |
STACK_GLOBAL operands crafted to bypass naive string-matching scanners |
PERSISTENT_ID |
PERSID / BINPERSID references that delegate object construction to the loader |
PICKLE_EXPANSION |
Oversized or amplified pickle structures consistent with zip-bomb-style payloads |
POST_BUDGET_GLOBAL |
Dangerous globals observed after the opcode budget, surfaced conservatively |
STRUCTURAL_TAMPER |
Opcode sequences that do not correspond to any legitimate pickle producer |
SUSPICIOUS_STRING |
High-signal string literals (shell metacharacters, import payloads, URLs) |
S203 |
Non-allowlisted __main__ global reference (requires manual review before loading) |
S213 |
Raw (unencoded) nested pickle payload inside a byte field |
S601 |
Base64-encoded nested pickle payload inside a string literal |
S602 |
Hex-encoded nested pickle payload inside a string literal |
The scanner covers pickle protocols 0 through 5, recognizes short and extended opcodes, and reconstructs module.class targets for STACK_GLOBAL without executing them.
When to use this vs. modelaudit
Use modelaudit-picklescan if you want a single-purpose library to embed in another tool: a linter, a model registry gate, a custom CI step, or a server-side scanner. It does pickle analysis and nothing else.
Use modelaudit if you want the full static scanner CLI: 40+ model/archive format scanners, SARIF and JSON output, remote-source scanning (Hugging Face, S3, GCS, JFrog, MLflow, DVC), license and secret detection, caching, progress reporting, and CI recipes. modelaudit uses this package internally for its pickle scanner.
API overview
from modelaudit_picklescan import (
PickleScanner, ScanOptions,
scan_file, scan_bytes, scan_stream,
PickleReport, Finding, Notice, ScanError,
Severity, ScanStatus, SafetyVerdict, CoverageSummary,
)
Three convenience entry points, each returning a PickleReport:
scan_file(path, *, options=None)— scan a.pkl/.pickleor a PyTorch ZIP checkpoint (detects the container, enumerates pickle members, combines reports).scan_bytes(data, *, source="<bytes>", options=None)— scan an in-memory payload.scan_stream(stream, *, source="<stream>", size=None, options=None)— scan a binary file-like object; falls back to bounded spooling whensizeis unknown.
For long-running services, construct PickleScanner(options=...) once and reuse it across calls.
Resource controls — ScanOptions
All fields have safe defaults; override only what you need.
| Field | Default | Meaning |
|---|---|---|
timeout_s |
3600.0 |
Per-scan wall clock, capped at 86_400 seconds |
max_opcodes |
1_000_000 |
Opcode budget before the scanner downgrades to partial |
post_budget_scan_bytes |
100 MiB |
Bytes to keep scanning for globals after the budget |
max_known_stream_read_bytes |
100 MiB |
Cap on streams with a known size |
max_unbounded_stream_read_bytes |
8 MiB |
Cap on streams without a known size |
max_string_literal_scan_chars |
8 MiB |
Cap on bytes inspected for SUSPICIOUS_STRING |
max_nested_pickle_bytes |
2 MiB |
Cap on each decoded nested-payload inspection |
max_nested_depth |
2 |
Recursion depth for base64/hex-encoded pickles |
Construction validates every field; pass invalid values and you'll get a ValueError immediately instead of a misleading scan result.
Report contract — PickleReport
status: ScanStatus—complete,inconclusive, orerror.verdict: SafetyVerdict—clean,suspicious,malicious, orunknown.cleanrequiresstatus=completewith no findings.findings: tuple[Finding, ...]— WARNING or CRITICAL security results.notices: tuple[Notice, ...]— DEBUG/INFO explainability and coverage notes (budget hits, truncation, unsupported members).errors: tuple[ScanError, ...]— operational failures (short reads, malformed containers, engine errors).coverage: CoverageSummary—bytes_scanned,bytes_total,opcode_count, and per-phase completion flags.metadata: Mapping[str, Any]— container info (e.g.container_type="pytorch_zip", archive size, pickle members).duration_s: float— scan wall clock.
Convenience accessors: report.has_security_findings, report.is_clean, report.to_dict().
Reports and all nested models are frozen — call to_dict() if you need a mutable payload for serialization. For aggregation, treat findings at warning/critical as security alerts; group notices by code rather than showing every INFO row as actionable.
PyTorch ZIP checkpoints
scan_file auto-detects PyTorch ZIP containers from PyTorch metadata plus pickle members, including hidden members, and combines per-member reports into a single container-level report with metadata.container_type="pytorch_zip". Archive member count is capped at 10,000 entries; per-member pickles are capped at 512 MiB. Both limits are enforced by structured notices, not silent skips.
Building from source
Wheels cover five targets; any other platform or a custom Python ABI requires building from source:
# Requires Rust 1.83+ and a working C toolchain
pip install modelaudit-picklescan --no-binary modelaudit-picklescan
From a checkout:
pip install packages/modelaudit-picklescan
# or, for development with hot-reload of the Rust extension:
maturin develop --release -m packages/modelaudit-picklescan/Cargo.toml
Stability and versioning
modelaudit-picklescan follows semantic versioning. 0.x should be read as pre-1.0 — expect small adjustments as the API settles. The working intent, reflected in the current code, is:
- Resource-control defaults (
ScanOptions) are tuned conservatively; changes that relax a default will be called out in the changelog. - Public report models (
PickleReport,Finding,Notice,ScanError) and their field names are the supported surface for serialization and downstream tooling. - Rule codes are intended to be additive — new codes rather than renames — so that downstream allowlists and suppressions remain stable.
- Verdict semantics —
SafetyVerdict.CLEANis only returned whenScanStatus.COMPLETEholds and there are no findings; truncation, timeouts, and engine errors never produceCLEAN. This is enforced in_combine_verdict/_with_*_noticeinapi.py.
Any change to the items above will be announced in CHANGELOG.md and the GitHub release notes.
Security and reporting
Please do not open public GitHub issues for suspected vulnerabilities. See the project security policy for coordinated disclosure.
Links
- Changelog: https://github.com/promptfoo/modelaudit/blob/main/packages/modelaudit-picklescan/CHANGELOG.md
- Repository: https://github.com/promptfoo/modelaudit
- Issues: https://github.com/promptfoo/modelaudit/issues
- Parent package: https://pypi.org/project/modelaudit/
- Security model (docs): https://github.com/promptfoo/modelaudit/blob/main/docs/user/security-model.md
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelaudit_picklescan-0.1.4.tar.gz.
File metadata
- Download URL: modelaudit_picklescan-0.1.4.tar.gz
- Upload date:
- Size: 203.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20ae51eee2f8bcb37616d440acba1911a8bdc9fb09bfb4028441d88c9728a2a4
|
|
| MD5 |
231445b3a13ce4118d2fabbcec5c5976
|
|
| BLAKE2b-256 |
db77129c192d0a6d68fe1df7bb0a590f7cf854f97d406bcabf645bbcd23537d9
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4.tar.gz:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4.tar.gz -
Subject digest:
20ae51eee2f8bcb37616d440acba1911a8bdc9fb09bfb4028441d88c9728a2a4 - Sigstore transparency entry: 1429483455
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type:
File details
Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 472.3 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3c165bdbe504afabf6a460d9584be57d1d9418d1e61a7497f65feff931e2b2
|
|
| MD5 |
66fca457a12b5920d971056ee673bf5e
|
|
| BLAKE2b-256 |
24d72e3958073577775c20ca59b775e692d708d23cf29c0c553e8d98c8f135e6
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl -
Subject digest:
1b3c165bdbe504afabf6a460d9584be57d1d9418d1e61a7497f65feff931e2b2 - Sigstore transparency entry: 1429483472
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type:
File details
Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 616.3 kB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a8f3fde21633dcc6faefcbb65b978caa4cfda8283614e65f4df1219024813b4
|
|
| MD5 |
5ba157dc1f413ed28557185227180136
|
|
| BLAKE2b-256 |
b580accac9091705c8da3946dc7c96b8de3a6c6a1c8f4cf5d4213a640a2eb5e6
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
1a8f3fde21633dcc6faefcbb65b978caa4cfda8283614e65f4df1219024813b4 - Sigstore transparency entry: 1429483469
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type:
File details
Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 609.7 kB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b0ceea10ada9b9ff92c6f0ddce9a8fc0cefca194c1c16ba9b7524bc4c5b87ee
|
|
| MD5 |
9f2fd1d3520ce50662775008928ac981
|
|
| BLAKE2b-256 |
43c3d06db85f0d324eac35a62b8523b57cc68e4b15b576fdd644ae36a8d8299e
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
1b0ceea10ada9b9ff92c6f0ddce9a8fc0cefca194c1c16ba9b7524bc4c5b87ee - Sigstore transparency entry: 1429483458
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type:
File details
Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 564.9 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1caa29f9db36e6f95a65c284def80f4186658f762bd7f227dbb7003aecc7dc64
|
|
| MD5 |
7a3751c7729ec020a0d6c368f4f4a209
|
|
| BLAKE2b-256 |
3cd52528c546d82008724b5fdf17a79b35bed8e00ba85fdae21b18e331b7254d
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
1caa29f9db36e6f95a65c284def80f4186658f762bd7f227dbb7003aecc7dc64 - Sigstore transparency entry: 1429483475
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type:
File details
Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 571.5 kB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c323dd839d170ac60915632bcc3d7ba8b79323a5b15056d8860f91d25b3f4b2
|
|
| MD5 |
152dc7335523517da171586b1662c98f
|
|
| BLAKE2b-256 |
63c4639b8dfecb73d653dc31e3ea3f7a195de2c22e278669ab4eec68a4eb0615
|
Provenance
The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl:
Publisher:
release-please.yml on promptfoo/modelaudit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl -
Subject digest:
9c323dd839d170ac60915632bcc3d7ba8b79323a5b15056d8860f91d25b3f4b2 - Sigstore transparency entry: 1429483466
- Sigstore integration time:
-
Permalink:
promptfoo/modelaudit@0992765951d34115dd88a10115459834f7430cba -
Branch / Tag:
refs/heads/main - Owner: https://github.com/promptfoo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@0992765951d34115dd88a10115459834f7430cba -
Trigger Event:
push
-
Statement type: