Skip to main content

Standalone pickle security scanner extracted from ModelAudit

Project description

modelaudit-picklescan

Rust-backed, bounded, static pickle security scanner. Inspects Python pickle streams and PyTorch ZIP checkpoints without unpickling them, and returns a typed report you can feed into CI, SARIF exporters, or custom policy engines.

PyPI version Python versions License: MIT Wheel platforms

Why this package

Pickle deserialization is the most common supply-chain attack vector in ML checkpoints, and existing Python-only scanners either unpickle the payload (unsafe), scan string literals only (imprecise), or fail open on large/malformed inputs (dangerous in CI). This package is a direct response:

  • Rust scanner engine. Opcode walker, string analyzer, and nested-payload decoder are all native code.
  • Fail-closed semantics. Every scan returns both a status (complete / inconclusive / error) and a verdict (clean / suspicious / malicious / unknown). Truncation, timeouts, budget exhaustion, and parser errors downgrade the verdict instead of silently returning clean.
  • Bounded by construction. Opcode count, wall-clock timeout, string-literal bytes, nested-payload bytes, and recursion depth are all configurable caps with safe defaults. A malicious producer cannot force unbounded memory or CPU.
  • Zero Python runtime dependencies. The wheel is self-contained — pip install modelaudit-picklescan and nothing else.
  • Attested provenance. Release wheels are published to PyPI with sigstore attestations via GitHub Actions trusted publishing.
  • Typed, immutable reports. PickleReport, Finding, Notice, and ScanError are frozen dataclasses with to_dict() for serialization. The package ships py.typed for mypy / pyright.

Install

pip install modelaudit-picklescan

Pre-built abi3 wheels ship for Python 3.10–3.13 on five targets: Linux x86_64, Linux aarch64, macOS arm64, macOS x86_64, and Windows x64. Other platforms install from the sdist and require a Rust toolchain (see Building from source).

Quickstart

from modelaudit_picklescan import scan_file

report = scan_file("suspicious_model.pt")  # raw pickle or PyTorch ZIP checkpoint

print(f"status={report.status.value} verdict={report.verdict.value}")
for finding in report.findings:
    print(f"  [{finding.severity.value}] {finding.rule_code}: {finding.message}")
    if finding.location:
        print(f"    at {finding.location}")

Example output on a PyTorch ZIP whose inner pickle reduces on os.system:

status=complete verdict=malicious
  [critical] DANGEROUS_CALL: Found REDUCE opcode invoking os.system
    at suspicious_model.pt:archive/data.pkl (pos 42)

Example output on a truncated or oversized pickle where analysis is incomplete:

status=inconclusive verdict=unknown
  (no findings — scan was truncated, inspect report.notices and report.coverage)

The finding.location string follows the format {source} (pos {byte_offset}). The source on PyTorch ZIP members is {archive_path}:{member_name}.

What it detects

Each finding carries a rule_code so downstream tooling can allowlist, suppress, or route alerts:

Rule code What it flags
DANGEROUS_CALL REDUCE/NEWOBJ/NEWOBJ_EX opcodes invoking a callable known to execute code
DANGEROUS_GLOBAL Imports of modules or classes that enable code execution when the pickle is loaded
EXTENSION_REF copyreg.extension / EXT1/EXT2/EXT4 opcodes that resolve through process state
MALFORMED_STACK_GLOBAL STACK_GLOBAL operands crafted to bypass naive string-matching scanners
PERSISTENT_ID PERSID / BINPERSID references that delegate object construction to the loader
PICKLE_EXPANSION Oversized or amplified pickle structures consistent with zip-bomb-style payloads
POST_BUDGET_GLOBAL Dangerous globals observed after the opcode budget, surfaced conservatively
STRUCTURAL_TAMPER Opcode sequences that do not correspond to any legitimate pickle producer
SUSPICIOUS_STRING High-signal string literals (shell metacharacters, import payloads, URLs)
S203 Non-allowlisted __main__ global reference (requires manual review before loading)
S213 Raw (unencoded) nested pickle payload inside a byte field
S601 Base64-encoded nested pickle payload inside a string literal
S602 Hex-encoded nested pickle payload inside a string literal

The scanner covers pickle protocols 0 through 5, recognizes short and extended opcodes, and reconstructs module.class targets for STACK_GLOBAL without executing them.

When to use this vs. modelaudit

Use modelaudit-picklescan if you want a single-purpose library to embed in another tool: a linter, a model registry gate, a custom CI step, or a server-side scanner. It does pickle analysis and nothing else.

Use modelaudit if you want the full static scanner CLI: 40+ model/archive format scanners, SARIF and JSON output, remote-source scanning (Hugging Face, S3, GCS, JFrog, MLflow, DVC), license and secret detection, caching, progress reporting, and CI recipes. modelaudit uses this package internally for its pickle scanner.

API overview

from modelaudit_picklescan import (
    PickleScanner, ScanOptions,
    scan_file, scan_bytes, scan_stream,
    PickleReport, Finding, Notice, ScanError,
    Severity, ScanStatus, SafetyVerdict, CoverageSummary,
)

Three convenience entry points, each returning a PickleReport:

  • scan_file(path, *, options=None) — scan a .pkl / .pickle or a PyTorch ZIP checkpoint (detects the container, enumerates pickle members, combines reports).
  • scan_bytes(data, *, source="<bytes>", options=None) — scan an in-memory payload.
  • scan_stream(stream, *, source="<stream>", size=None, options=None) — scan a binary file-like object; falls back to bounded spooling when size is unknown.

For long-running services, construct PickleScanner(options=...) once and reuse it across calls.

Resource controls — ScanOptions

All fields have safe defaults; override only what you need.

Field Default Meaning
timeout_s 3600.0 Per-scan wall clock, capped at 86_400 seconds
max_opcodes 1_000_000 Opcode budget before the scanner downgrades to partial
post_budget_scan_bytes 100 MiB Bytes to keep scanning for globals after the budget
max_known_stream_read_bytes 100 MiB Cap on streams with a known size
max_unbounded_stream_read_bytes 8 MiB Cap on streams without a known size
max_string_literal_scan_chars 8 MiB Cap on bytes inspected for SUSPICIOUS_STRING
max_nested_pickle_bytes 2 MiB Cap on each decoded nested-payload inspection
max_nested_depth 2 Recursion depth for base64/hex-encoded pickles

Construction validates every field; pass invalid values and you'll get a ValueError immediately instead of a misleading scan result.

Report contract — PickleReport

  • status: ScanStatuscomplete, inconclusive, or error.
  • verdict: SafetyVerdictclean, suspicious, malicious, or unknown. clean requires status=complete with no findings.
  • findings: tuple[Finding, ...] — WARNING or CRITICAL security results.
  • notices: tuple[Notice, ...] — DEBUG/INFO explainability and coverage notes (budget hits, truncation, unsupported members).
  • errors: tuple[ScanError, ...] — operational failures (short reads, malformed containers, engine errors).
  • coverage: CoverageSummarybytes_scanned, bytes_total, opcode_count, and per-phase completion flags.
  • metadata: Mapping[str, Any] — container info (e.g. container_type="pytorch_zip", archive size, pickle members).
  • duration_s: float — scan wall clock.

Convenience accessors: report.has_security_findings, report.is_clean, report.to_dict().

Reports and all nested models are frozen — call to_dict() if you need a mutable payload for serialization. For aggregation, treat findings at warning/critical as security alerts; group notices by code rather than showing every INFO row as actionable.

PyTorch ZIP checkpoints

scan_file auto-detects PyTorch ZIP containers from PyTorch metadata plus pickle members, including hidden members, and combines per-member reports into a single container-level report with metadata.container_type="pytorch_zip". Archive member count is capped at 10,000 entries; per-member pickles are capped at 512 MiB. Both limits are enforced by structured notices, not silent skips.

Building from source

Wheels cover five targets; any other platform or a custom Python ABI requires building from source:

# Requires Rust 1.83+ and a working C toolchain
pip install modelaudit-picklescan --no-binary modelaudit-picklescan

From a checkout:

pip install packages/modelaudit-picklescan
# or, for development with hot-reload of the Rust extension:
maturin develop --release -m packages/modelaudit-picklescan/Cargo.toml

Stability and versioning

modelaudit-picklescan follows semantic versioning. 0.x should be read as pre-1.0 — expect small adjustments as the API settles. The working intent, reflected in the current code, is:

  • Resource-control defaults (ScanOptions) are tuned conservatively; changes that relax a default will be called out in the changelog.
  • Public report models (PickleReport, Finding, Notice, ScanError) and their field names are the supported surface for serialization and downstream tooling.
  • Rule codes are intended to be additive — new codes rather than renames — so that downstream allowlists and suppressions remain stable.
  • Verdict semanticsSafetyVerdict.CLEAN is only returned when ScanStatus.COMPLETE holds and there are no findings; truncation, timeouts, and engine errors never produce CLEAN. This is enforced in _combine_verdict / _with_*_notice in api.py.

Any change to the items above will be announced in CHANGELOG.md and the GitHub release notes.

Security and reporting

Please do not open public GitHub issues for suspected vulnerabilities. See the project security policy for coordinated disclosure.

Links

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelaudit_picklescan-0.1.4.tar.gz (203.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl (472.3 kB view details)

Uploaded CPython 3.10+Windows x86-64

modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl (616.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl (609.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl (564.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl (571.5 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file modelaudit_picklescan-0.1.4.tar.gz.

File metadata

  • Download URL: modelaudit_picklescan-0.1.4.tar.gz
  • Upload date:
  • Size: 203.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for modelaudit_picklescan-0.1.4.tar.gz
Algorithm Hash digest
SHA256 20ae51eee2f8bcb37616d440acba1911a8bdc9fb09bfb4028441d88c9728a2a4
MD5 231445b3a13ce4118d2fabbcec5c5976
BLAKE2b-256 db77129c192d0a6d68fe1df7bb0a590f7cf854f97d406bcabf645bbcd23537d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4.tar.gz:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1b3c165bdbe504afabf6a460d9584be57d1d9418d1e61a7497f65feff931e2b2
MD5 66fca457a12b5920d971056ee673bf5e
BLAKE2b-256 24d72e3958073577775c20ca59b775e692d708d23cf29c0c553e8d98c8f135e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-win_amd64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1a8f3fde21633dcc6faefcbb65b978caa4cfda8283614e65f4df1219024813b4
MD5 5ba157dc1f413ed28557185227180136
BLAKE2b-256 b580accac9091705c8da3946dc7c96b8de3a6c6a1c8f4cf5d4213a640a2eb5e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1b0ceea10ada9b9ff92c6f0ddce9a8fc0cefca194c1c16ba9b7524bc4c5b87ee
MD5 9f2fd1d3520ce50662775008928ac981
BLAKE2b-256 43c3d06db85f0d324eac35a62b8523b57cc68e4b15b576fdd644ae36a8d8299e

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1caa29f9db36e6f95a65c284def80f4186658f762bd7f227dbb7003aecc7dc64
MD5 7a3751c7729ec020a0d6c368f4f4a209
BLAKE2b-256 3cd52528c546d82008724b5fdf17a79b35bed8e00ba85fdae21b18e331b7254d

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9c323dd839d170ac60915632bcc3d7ba8b79323a5b15056d8860f91d25b3f4b2
MD5 152dc7335523517da171586b1662c98f
BLAKE2b-256 63c4639b8dfecb73d653dc31e3ea3f7a195de2c22e278669ab4eec68a4eb0615

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page