Skip to main content

Production-grade static analysis tool for detecting malicious Python pickle files

Project description

PickleGuard

Production-grade static analysis for detecting malicious Python pickle files. Built to protect ML pipelines from pickle-based attacks.

Why PickleGuard?

Python's pickle format is a known security risk - arbitrary code execution during deserialization. As ML models are increasingly shared via pickle-based formats (.pt, .pth, .pkl), attackers exploit this to distribute malware disguised as models.

PickleGuard detects these threats through deep opcode analysis, catching attacks that bypass existing tools.

Benchmark Results

Evaluated on the PickleBall dataset (84 malicious samples) and 268 benign models from HuggingFace:

Tool True Positive Rate False Positive Rate
PickleGuard 96.4% 0.0%
Picklescan 92.9% 6.2%
ModelScan 90.5% N/A

Installation

pip install pickleguard

Quick Start

# Scan a model file
pickleguard scan model.pt

# Scan directory recursively
pickleguard scan ./models/ -r

# JSON output for CI/CD
pickleguard scan model.pt -f json

# SARIF output for GitHub Code Scanning
pickleguard scan ./models/ -f sarif -o results.sarif

What It Detects

Dangerous Callables (200+)

  • Code Execution: os.system, subprocess.Popen, eval, exec
  • Import Attacks: __import__, importlib.import_module
  • Network Operations: socket.socket, urllib.request.urlopen
  • File Operations: open, shutil.rmtree, os.remove
  • Deserialization Chains: pickle.loads, marshal.loads, yaml.load

Obfuscation Techniques

Technique Description
Nested Module Paths torch.serialization.os.system
INST Opcode Bypass Evades GLOBAL+REDUCE detection
STACK_GLOBAL Dynamic name resolution
BUILD Injection Setting __reduce__ via state
Encoded Payloads Base64/hex obfuscated strings
Unicode Homoglyphs Lookalike character substitution

Supported Formats

  • Raw pickle (protocol 0-5)
  • PyTorch containers (.pt, .pth, .bin)
  • NumPy files (.npy, .npz)
  • SafeTensors (marked safe)
  • ONNX models

Output Example

============================================================
File: malicious_model.pt
Format: pytorch_zip

Risk Assessment:
  Level: CRITICAL
  Score: 100/100
  Confidence: 100%

Obfuscation Detected:
  - NESTED_MODULE_PATH

Findings (2):

  [CRITICAL] dangerous_callable_nested_module_attack
    Dangerous callable: torch.serialization.os.system
    Callable: torch.serialization.os.system
    Position: 2

  [HIGH] obfuscation_nested_module_path
    Name contains dangerous segment 'os'
    Position: 2

============================================================

Python API

from pickle_scanner import PickleScanner

scanner = PickleScanner()
result = scanner.scan_file("model.pt")

if result.report.risk_level.name == "CRITICAL":
    print(f"Threat detected: {result.report.risk_score}/100")
    for finding in result.report.findings:
        print(f"  [{finding.severity}] {finding.message}")

CI/CD Integration

GitHub Actions

- name: Scan ML Models
  run: |
    pip install pickleguard
    pickleguard scan ./models/ -r -f sarif -o results.sarif

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: results.sarif

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: pickleguard
        name: PickleGuard
        entry: pickleguard scan
        language: system
        files: \.(pt|pth|pkl|pickle)$

Custom Rules

Define custom detection rules in YAML:

rules:
  - name: "block_custom_module"
    severity: critical
    description: "Block imports from untrusted module"
    conditions:
      - opcode: [GLOBAL, INST]
        module: "untrusted_module"
pickleguard scan model.pt --rules custom_rules.yaml

How It Works

PickleGuard uses a multi-stage analysis pipeline:

  1. Format Detection: Identifies file type (pickle, PyTorch ZIP, NumPy, etc.)
  2. Opcode Parsing: Extracts all pickle opcodes from the stream
  3. Stack Simulation: Abstract interpretation without code execution
  4. Threat Analysis: Matches against 200+ dangerous callable patterns
  5. Obfuscation Detection: Identifies evasion techniques
  6. Risk Scoring: Multi-factor scoring with context awareness

Risk Levels

Level Score Description
CRITICAL 85-100 Confirmed dangerous callable
HIGH 60-84 Dangerous pattern or obfuscation
MEDIUM 30-59 Unknown callable detected
LOW 1-29 Minor indicators
SAFE 0 Clean file

Comparison with Alternatives

Feature PickleGuard Picklescan ModelScan Fickling
TPR 96.4% 92.9% 90.5% -
FPR 0.0% 6.2% - -
Nested Path Detection Yes No No No
INST Bypass Detection Yes No No No
PyTorch ZIP Support Yes Yes Yes No
Safe Builtin Whitelist Yes No No No
SARIF Output Yes No Yes No

CLI Reference

usage: pickleguard scan [-h] [-r] [-f {text,json,sarif}] [-o OUTPUT] [-v]
                        [--show-safe-patterns] [--rules RULES] path

positional arguments:
  path                  File or directory to scan

options:
  -r, --recursive       Scan directories recursively
  -f, --format          Output format (default: text)
  -o, --output          Write output to file
  -v, --verbose         Show detailed findings
  --show-safe-patterns  Include safe ML patterns in output
  --rules RULES         Custom rules YAML file

Contributing

Contributions welcome. Please ensure:

  1. New detection rules include test cases
  2. Changes maintain 0% false positive rate
  3. Code passes ruff and mypy checks
# Development setup
pip install -e ".[dev]"
pytest
ruff check .
mypy pickle_scanner/

License

MIT License

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pickleguard-1.0.1.tar.gz (65.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pickleguard-1.0.1-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file pickleguard-1.0.1.tar.gz.

File metadata

  • Download URL: pickleguard-1.0.1.tar.gz
  • Upload date:
  • Size: 65.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for pickleguard-1.0.1.tar.gz
Algorithm Hash digest
SHA256 82e35a4c7f14f2dc3bcebd7b461d6ed941639bb514aa3bd0aef00efb167d730d
MD5 709c9829991f1b63294896cf54fc020e
BLAKE2b-256 6209a8dfcf774811fbe31e2d0e1dbde7107e077f0ea147a7ece801c1fc4aef0d

See more details on using hashes here.

File details

Details for the file pickleguard-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pickleguard-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for pickleguard-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4df07e04141632b819b9184bb05d8dd6d221c6a383605cf2b3c6fc0d7127f4d8
MD5 6265725fd1ff5a6c7412999e46e5761a
BLAKE2b-256 151d97b874d8a0dc5d5550d6d742ee1c546a26b715c7f06448e9c365aead20f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page