Skip to main content

Standalone pickle security scanner extracted from ModelAudit

Project description

modelaudit-picklescan

Rust-backed, bounded, static pickle security scanner. Inspects Python pickle streams and PyTorch ZIP checkpoints without unpickling them, and returns a typed report you can feed into CI, SARIF exporters, or custom policy engines.

PyPI version Python versions License: MIT Wheel platforms

Why this package

Pickle deserialization is the most common supply-chain attack vector in ML checkpoints, and existing Python-only scanners either unpickle the payload (unsafe), scan string literals only (imprecise), or fail open on large/malformed inputs (dangerous in CI). This package is a direct response:

  • Rust scanner engine. Opcode walker, string analyzer, and nested-payload decoder are all native code.
  • Fail-closed semantics. Every scan returns both a status (complete / inconclusive / error) and a verdict (clean / suspicious / malicious / unknown). Truncation, timeouts, budget exhaustion, and parser errors downgrade the verdict instead of silently returning clean.
  • Bounded by construction. Opcode count, wall-clock timeout, string-literal bytes, nested-payload bytes, and recursion depth are all configurable caps with safe defaults. A malicious producer cannot force unbounded memory or CPU.
  • Zero Python runtime dependencies. The wheel is self-contained — pip install modelaudit-picklescan and nothing else.
  • Attested provenance. Release wheels are published to PyPI with sigstore attestations via GitHub Actions trusted publishing.
  • Typed, immutable reports. PickleReport, Finding, Notice, and ScanError are frozen dataclasses with to_dict() for serialization. The package ships py.typed for mypy / pyright.

Install

pip install modelaudit-picklescan

Pre-built abi3 wheels ship for Python 3.10–3.13 on five targets: Linux x86_64, Linux aarch64, macOS arm64, macOS x86_64, and Windows x64. Other platforms install from the sdist and require a Rust toolchain (see Building from source).

Quickstart

from modelaudit_picklescan import scan_file

report = scan_file("suspicious_model.pt")  # raw pickle or PyTorch ZIP checkpoint

print(f"status={report.status.value} verdict={report.verdict.value}")
for finding in report.findings:
    print(f"  [{finding.severity.value}] {finding.rule_code}: {finding.message}")
    if finding.location:
        print(f"    at {finding.location}")

Example output on a PyTorch ZIP whose inner pickle reduces on os.system:

status=complete verdict=malicious
  [critical] DANGEROUS_CALL: Found REDUCE opcode invoking os.system
    at suspicious_model.pt:archive/data.pkl (pos 42)

Example output on a truncated or oversized pickle where analysis is incomplete:

status=inconclusive verdict=unknown
  (no findings — scan was truncated, inspect report.notices and report.coverage)

The finding.location string follows the format {source} (pos {byte_offset}). The source on PyTorch ZIP members is {archive_path}:{member_name}.

What it detects

Each finding carries a rule_code so downstream tooling can allowlist, suppress, or route alerts:

Rule code What it flags
DANGEROUS_CALL REDUCE/NEWOBJ/NEWOBJ_EX opcodes invoking a callable known to execute code
DANGEROUS_GLOBAL Imports of modules or classes that enable code execution when the pickle is loaded
EXTENSION_REF copyreg.extension / EXT1/EXT2/EXT4 opcodes that resolve through process state
MALFORMED_STACK_GLOBAL STACK_GLOBAL operands crafted to bypass naive string-matching scanners
PERSISTENT_ID PERSID / BINPERSID references that delegate object construction to the loader
PICKLE_EXPANSION Oversized or amplified pickle structures consistent with zip-bomb-style payloads
POST_BUDGET_GLOBAL Dangerous globals observed after the opcode budget, surfaced conservatively
STRUCTURAL_TAMPER Opcode sequences that do not correspond to any legitimate pickle producer
SUSPICIOUS_STRING High-signal string literals (shell metacharacters, import payloads, URLs)
S203 Non-allowlisted __main__ global reference (requires manual review before loading)
S213 Raw (unencoded) nested pickle payload inside a byte field
S601 Base64-encoded nested pickle payload inside a string literal
S602 Hex-encoded nested pickle payload inside a string literal

The scanner covers pickle protocols 0 through 5, recognizes short and extended opcodes, and reconstructs module.class targets for STACK_GLOBAL without executing them.

When to use this vs. modelaudit

Use modelaudit-picklescan if you want a single-purpose library to embed in another tool: a linter, a model registry gate, a custom CI step, or a server-side scanner. It does pickle analysis and nothing else.

Use modelaudit if you want the full static scanner CLI: 40+ model/archive format scanners, SARIF and JSON output, remote-source scanning (Hugging Face, S3, GCS, JFrog, MLflow, DVC), license and secret detection, caching, progress reporting, and CI recipes. modelaudit uses this package internally for its pickle scanner.

API overview

from modelaudit_picklescan import (
    PickleScanner, ScanOptions,
    scan_file, scan_bytes, scan_stream,
    PickleReport, Finding, Notice, ScanError,
    Severity, ScanStatus, SafetyVerdict, CoverageSummary,
)

Three convenience entry points, each returning a PickleReport:

  • scan_file(path, *, options=None) — scan a .pkl / .pickle or a PyTorch ZIP checkpoint (detects the container, enumerates pickle members, combines reports).
  • scan_bytes(data, *, source="<bytes>", options=None) — scan an in-memory payload.
  • scan_stream(stream, *, source="<stream>", size=None, options=None) — scan a binary file-like object; falls back to bounded spooling when size is unknown.

For long-running services, construct PickleScanner(options=...) once and reuse it across calls.

Resource controls — ScanOptions

All fields have safe defaults; override only what you need.

Field Default Meaning
timeout_s 3600.0 Per-scan wall clock, capped at 86_400 seconds
max_opcodes 1_000_000 Opcode budget before the scanner downgrades to partial
post_budget_scan_bytes 100 MiB Bytes to keep scanning for globals after the budget
max_known_stream_read_bytes 100 MiB Cap on streams with a known size
max_unbounded_stream_read_bytes 8 MiB Cap on streams without a known size
max_string_literal_scan_chars 8 MiB Cap on bytes inspected for SUSPICIOUS_STRING
max_nested_pickle_bytes 2 MiB Cap on each decoded nested-payload inspection
max_nested_depth 2 Recursion depth for base64/hex-encoded pickles

Construction validates every field; pass invalid values and you'll get a ValueError immediately instead of a misleading scan result.

Report contract — PickleReport

  • status: ScanStatuscomplete, inconclusive, or error.
  • verdict: SafetyVerdictclean, suspicious, malicious, or unknown. clean requires status=complete with no findings.
  • findings: tuple[Finding, ...] — WARNING or CRITICAL security results.
  • notices: tuple[Notice, ...] — DEBUG/INFO explainability and coverage notes (budget hits, truncation, unsupported members).
  • errors: tuple[ScanError, ...] — operational failures (short reads, malformed containers, engine errors).
  • coverage: CoverageSummarybytes_scanned, bytes_total, opcode_count, and per-phase completion flags.
  • metadata: Mapping[str, Any] — container info (e.g. container_type="pytorch_zip", archive size, pickle members).
  • duration_s: float — scan wall clock.

Convenience accessors: report.has_security_findings, report.is_clean, report.to_dict().

Reports and all nested models are frozen — call to_dict() if you need a mutable payload for serialization. For aggregation, treat findings at warning/critical as security alerts; group notices by code rather than showing every INFO row as actionable.

PyTorch ZIP checkpoints

scan_file auto-detects PyTorch ZIP containers from PyTorch metadata plus pickle members, including hidden members, and combines per-member reports into a single container-level report with metadata.container_type="pytorch_zip". Archive member count is capped at 10,000 entries; per-member pickles are capped at 512 MiB. Both limits are enforced by structured notices, not silent skips.

Building from source

Wheels cover five targets; any other platform or a custom Python ABI requires building from source:

# Requires Rust 1.83+ and a working C toolchain
pip install modelaudit-picklescan --no-binary modelaudit-picklescan

From a checkout:

pip install packages/modelaudit-picklescan
# or, for development with hot-reload of the Rust extension:
maturin develop --release -m packages/modelaudit-picklescan/Cargo.toml

Stability and versioning

modelaudit-picklescan follows semantic versioning. 0.x should be read as pre-1.0 — expect small adjustments as the API settles. The working intent, reflected in the current code, is:

  • Resource-control defaults (ScanOptions) are tuned conservatively; changes that relax a default will be called out in the changelog.
  • Public report models (PickleReport, Finding, Notice, ScanError) and their field names are the supported surface for serialization and downstream tooling.
  • Rule codes are intended to be additive — new codes rather than renames — so that downstream allowlists and suppressions remain stable.
  • Verdict semanticsSafetyVerdict.CLEAN is only returned when ScanStatus.COMPLETE holds and there are no findings; truncation, timeouts, and engine errors never produce CLEAN. This is enforced in _combine_verdict / _with_*_notice in api.py.

Any change to the items above will be announced in CHANGELOG.md and the GitHub release notes.

Security and reporting

Please do not open public GitHub issues for suspected vulnerabilities. See the project security policy for coordinated disclosure.

Links

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelaudit_picklescan-0.1.5.tar.gz (207.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modelaudit_picklescan-0.1.5-cp310-abi3-win_amd64.whl (475.9 kB view details)

Uploaded CPython 3.10+Windows x86-64

modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl (619.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl (613.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

modelaudit_picklescan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl (566.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

modelaudit_picklescan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl (574.6 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file modelaudit_picklescan-0.1.5.tar.gz.

File metadata

  • Download URL: modelaudit_picklescan-0.1.5.tar.gz
  • Upload date:
  • Size: 207.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for modelaudit_picklescan-0.1.5.tar.gz
Algorithm Hash digest
SHA256 75c1f887cd910d8798d5208beea1d10f263e19bd5a72ffe6c237f7212ccb2ed6
MD5 9171ba9f84ce7c11282fa92e1b1dc56d
BLAKE2b-256 a4bf822eb4af318109d2c7ac355f535606716699ae2c0572cfbdca41bbd37110

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5.tar.gz:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.5-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.5-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1fa75f48ff37e5c56293754b010f4edce8a04894db8c660cbb59269d4bcd79b6
MD5 5ef45b970777dd6ea53cca25fb3f2a20
BLAKE2b-256 63b4c53679f51136e809c046524e2513ba29f4bb995a7cebf47cdf426b392f4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5-cp310-abi3-win_amd64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7ac4658c8e5b9ab9839ee9ba6c0baebeccfd5865f5615fdc9b78d1b505332898
MD5 24f6851143dc6ff7d1b65106dbe582f8
BLAKE2b-256 628919542bf5f39306217d294a3979689e46e69c2793575e1dfbf46917bb28de

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f4ebfa94754ba03935ce54dc150da0642f9f75394a6a86bc3870060b996eaefc
MD5 5f6c1f2b827a0ccad60ddd864f39ce1e
BLAKE2b-256 5a14bee190f1b1fefe12fec2087f6139b9628ff66b7cfa4692a318b506156b78

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 17cdd76e6840c76246ef5a04a1a6ea9c1cecb997f26cff1ca93310185251fb9a
MD5 bac196429bdfa791c28fec4534704d5b
BLAKE2b-256 341b0cb343a2eeeb4d7ca258576ba086c6991a7b64cc9466864eaffb8d845681

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b6fbbe51ceeb7b59422ddda01bb1cd0faa13b443513bf75e86b19098ccca7e95
MD5 bbed5b2c43686e0fc248b41ec56131fa
BLAKE2b-256 70936de6caaa99fbddc10f2966ab51647db1516dce2522b7e9bd493c3234fb8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page