Skip to main content

Standalone pickle security scanner extracted from ModelAudit

Project description

modelaudit-picklescan

Rust-backed, bounded, static pickle security scanner. Inspects Python pickle streams and PyTorch ZIP checkpoints without unpickling them, and returns a typed report you can feed into CI, SARIF exporters, or custom policy engines.

PyPI version Python versions License: MIT Wheel platforms

Why this package

Pickle deserialization is the most common supply-chain attack vector in ML checkpoints, and existing Python-only scanners either unpickle the payload (unsafe), scan string literals only (imprecise), or fail open on large/malformed inputs (dangerous in CI). This package is a direct response:

  • Rust scanner engine. Opcode walker, string analyzer, and nested-payload decoder are all native code.
  • Fail-closed semantics. Every scan returns both a status (complete / inconclusive / error) and a verdict (clean / suspicious / malicious / unknown). Truncation, timeouts, budget exhaustion, and parser errors downgrade the verdict instead of silently returning clean.
  • Bounded by construction. Opcode count, wall-clock timeout, string-literal bytes, nested-payload bytes, and recursion depth are all configurable caps with safe defaults. A malicious producer cannot force unbounded memory or CPU.
  • Zero Python runtime dependencies. The wheel is self-contained — pip install modelaudit-picklescan and nothing else.
  • Attested provenance. Release wheels are published to PyPI with sigstore attestations via GitHub Actions trusted publishing.
  • Typed, immutable reports. PickleReport, Finding, Notice, and ScanError are frozen dataclasses with to_dict() for serialization. The package ships py.typed for mypy / pyright.

Install

pip install modelaudit-picklescan

Pre-built abi3 wheels ship for Python 3.10–3.13 on five targets: Linux x86_64, Linux aarch64, macOS arm64, macOS x86_64, and Windows x64. Other platforms install from the sdist and require a Rust toolchain (see Building from source).

Quickstart

from modelaudit_picklescan import scan_file

report = scan_file("suspicious_model.pt")  # raw pickle or PyTorch ZIP checkpoint

print(f"status={report.status.value} verdict={report.verdict.value}")
for finding in report.findings:
    print(f"  [{finding.severity.value}] {finding.rule_code}: {finding.message}")
    if finding.location:
        print(f"    at {finding.location}")

Example output on a PyTorch ZIP whose inner pickle reduces on os.system:

status=complete verdict=malicious
  [critical] DANGEROUS_CALL: Found REDUCE opcode invoking os.system
    at suspicious_model.pt:archive/data.pkl (pos 42)

Example output on a truncated or oversized pickle where analysis is incomplete:

status=inconclusive verdict=unknown
  (no findings — scan was truncated, inspect report.notices and report.coverage)

The finding.location string follows the format {source} (pos {byte_offset}). The source on PyTorch ZIP members is {archive_path}:{member_name}.

What it detects

Each finding carries a rule_code so downstream tooling can allowlist, suppress, or route alerts:

Rule code What it flags
DANGEROUS_CALL REDUCE/NEWOBJ/NEWOBJ_EX opcodes invoking a callable known to execute code
DANGEROUS_GLOBAL Imports of modules or classes that enable code execution when the pickle is loaded
EXTENSION_REF copyreg.extension / EXT1/EXT2/EXT4 opcodes that resolve through process state
MALFORMED_STACK_GLOBAL STACK_GLOBAL operands crafted to bypass naive string-matching scanners
PERSISTENT_ID PERSID / BINPERSID references that delegate object construction to the loader
PICKLE_EXPANSION Oversized or amplified pickle structures consistent with zip-bomb-style payloads
POST_BUDGET_GLOBAL Dangerous globals observed after the opcode budget, surfaced conservatively
STRUCTURAL_TAMPER Opcode sequences that do not correspond to any legitimate pickle producer
SUSPICIOUS_STRING High-signal string literals (shell metacharacters, import payloads, URLs)
S203 Non-allowlisted __main__ global reference (requires manual review before loading)
S213 Raw (unencoded) nested pickle payload inside a byte field
S601 Base64-encoded nested pickle payload inside a string literal
S602 Hex-encoded nested pickle payload inside a string literal

The scanner covers pickle protocols 0 through 5, recognizes short and extended opcodes, and reconstructs module.class targets for STACK_GLOBAL without executing them.

When to use this vs. modelaudit

Use modelaudit-picklescan if you want a single-purpose library to embed in another tool: a linter, a model registry gate, a custom CI step, or a server-side scanner. It does pickle analysis and nothing else.

Use modelaudit if you want the full static scanner CLI: 40+ model/archive format scanners, SARIF and JSON output, remote-source scanning (Hugging Face, S3, GCS, JFrog, MLflow, DVC), license and secret detection, caching, progress reporting, and CI recipes. modelaudit uses this package internally for its pickle scanner.

API overview

from modelaudit_picklescan import (
    PickleScanner, ScanOptions,
    scan_file, scan_bytes, scan_stream,
    PickleReport, Finding, Notice, ScanError,
    Severity, ScanStatus, SafetyVerdict, CoverageSummary,
)

Three convenience entry points, each returning a PickleReport:

  • scan_file(path, *, options=None) — scan a .pkl / .pickle or a PyTorch ZIP checkpoint (detects the container, enumerates pickle members, combines reports).
  • scan_bytes(data, *, source="<bytes>", options=None) — scan an in-memory payload.
  • scan_stream(stream, *, source="<stream>", size=None, options=None) — scan a binary file-like object; falls back to bounded spooling when size is unknown.

For long-running services, construct PickleScanner(options=...) once and reuse it across calls.

Resource controls — ScanOptions

All fields have safe defaults; override only what you need.

Field Default Meaning
timeout_s 3600.0 Per-scan wall clock, capped at 86_400 seconds
max_opcodes 1_000_000 Opcode budget before the scanner downgrades to partial
post_budget_scan_bytes 100 MiB Bytes to keep scanning for globals after the budget
max_known_stream_read_bytes 100 MiB Cap on streams with a known size
max_unbounded_stream_read_bytes 8 MiB Cap on streams without a known size
max_string_literal_scan_chars 8 MiB Cap on bytes inspected for SUSPICIOUS_STRING
max_nested_pickle_bytes 2 MiB Cap on each decoded nested-payload inspection
max_nested_depth 2 Recursion depth for base64/hex-encoded pickles

Construction validates every field; pass invalid values and you'll get a ValueError immediately instead of a misleading scan result.

Report contract — PickleReport

  • status: ScanStatuscomplete, inconclusive, or error.
  • verdict: SafetyVerdictclean, suspicious, malicious, or unknown. clean requires status=complete with no findings.
  • findings: tuple[Finding, ...] — WARNING or CRITICAL security results.
  • notices: tuple[Notice, ...] — DEBUG/INFO explainability and coverage notes (budget hits, truncation, unsupported members).
  • errors: tuple[ScanError, ...] — operational failures (short reads, malformed containers, engine errors).
  • coverage: CoverageSummarybytes_scanned, bytes_total, opcode_count, and per-phase completion flags.
  • metadata: Mapping[str, Any] — container info (e.g. container_type="pytorch_zip", archive size, pickle members).
  • duration_s: float — scan wall clock.

Convenience accessors: report.has_security_findings, report.is_clean, report.to_dict().

Reports and all nested models are frozen — call to_dict() if you need a mutable payload for serialization. For aggregation, treat findings at warning/critical as security alerts; group notices by code rather than showing every INFO row as actionable.

PyTorch ZIP checkpoints

scan_file auto-detects PyTorch ZIP containers (archives containing data.pkl plus version / byteorder metadata), enumerates pickle members (including hidden ones identified by content sniffing, not just extension), and combines per-member reports into a single container-level report with metadata.container_type="pytorch_zip". Archive member count is capped at 10,000 entries; per-member pickles are capped at 512 MiB. Both limits are enforced by structured notices, not silent skips.

Building from source

Wheels cover five targets; any other platform or a custom Python ABI requires building from source:

# Requires Rust 1.83+ and a working C toolchain
pip install modelaudit-picklescan --no-binary modelaudit-picklescan

From a checkout:

pip install packages/modelaudit-picklescan
# or, for development with hot-reload of the Rust extension:
maturin develop --release -m packages/modelaudit-picklescan/Cargo.toml

Stability and versioning

modelaudit-picklescan follows semantic versioning. 0.x should be read as pre-1.0 — expect small adjustments as the API settles. The working intent, reflected in the current code, is:

  • Resource-control defaults (ScanOptions) are tuned conservatively; changes that relax a default will be called out in the changelog.
  • Public report models (PickleReport, Finding, Notice, ScanError) and their field names are the supported surface for serialization and downstream tooling.
  • Rule codes are intended to be additive — new codes rather than renames — so that downstream allowlists and suppressions remain stable.
  • Verdict semanticsSafetyVerdict.CLEAN is only returned when ScanStatus.COMPLETE holds and there are no findings; truncation, timeouts, and engine errors never produce CLEAN. This is enforced in _combine_verdict / _with_*_notice in api.py.

Any change to the items above will be announced in CHANGELOG.md and the GitHub release notes.

Security and reporting

Please do not open public GitHub issues for suspected vulnerabilities. See the project security policy for coordinated disclosure.

Links

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelaudit_picklescan-0.1.3.tar.gz (190.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modelaudit_picklescan-0.1.3-cp310-abi3-win_amd64.whl (450.8 kB view details)

Uploaded CPython 3.10+Windows x86-64

modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_x86_64.whl (593.6 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_aarch64.whl (589.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

modelaudit_picklescan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl (546.4 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

modelaudit_picklescan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl (549.9 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file modelaudit_picklescan-0.1.3.tar.gz.

File metadata

  • Download URL: modelaudit_picklescan-0.1.3.tar.gz
  • Upload date:
  • Size: 190.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for modelaudit_picklescan-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e7f2bff25765ec670b39d783bbf8f8f77537bf2cebb9f6d45349b093546a2615
MD5 268c2ca9aba6c406b7d25e485fe57482
BLAKE2b-256 ea5e93a439c5364e6ec38f768bdb0264be43280ff0d1e8c74bb08d1892719c9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3.tar.gz:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.3-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e3bc3eb371dfb5146e1325724f55c7776278c08d5acf7ffd117bfcc19874c0ff
MD5 3e0ef8917135665ec88b74c65c25bfc6
BLAKE2b-256 f352bf0efad8ee0fd9556b1dd21add20081886135aa1c8e8f58b8afbc3c8091c

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3-cp310-abi3-win_amd64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a31fb74fec29eb356289b253de3d006a36a7bd1d11f46ac219a91e509212afa5
MD5 48ce2fd2f69da3467e91b514f6641970
BLAKE2b-256 29bf214f656cf2dfe3e7c5b76d0be5d55c1d71bff39fca7a3774bc913f33c9a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a148750c58b04d3c039688955f74ed24b8a517d08db3e5dd8c867bc49ddfde9c
MD5 aa962ef8871eea2cfe27bbc6a141b3d1
BLAKE2b-256 eac88393faa7f9c7d5fa0080c07577e88f6f3f32233dd5cc443927a75e933837

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f4d977b552db8bd23ef8e2436aaec4f47d8dffd7783d564266fbdd2ff50a60c5
MD5 cfc66b8f7eba55867c46283d9be37174
BLAKE2b-256 a6a07ae3370d555d50930ae51123b53e85e0aff9d5d69e51dc4d2b90d7fc0ba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modelaudit_picklescan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for modelaudit_picklescan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fa22c9201da10f6493fb96834ecda0d8abe9aa3438581facf2c432c06740f361
MD5 160aee0b923fc250b80fe9e2ba078de8
BLAKE2b-256 092bbfeadc1d0c942870c6df8a81413d7301268ed838e66d8222729a193ef911

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelaudit_picklescan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release-please.yml on promptfoo/modelaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page