Production-grade static analysis tool for detecting malicious Python pickle files
Project description
PickleGuard
Production-grade static analysis for detecting malicious Python pickle files. Built to protect ML pipelines from pickle-based attacks.
Why PickleGuard?
Python's pickle format is a known security risk - arbitrary code execution during deserialization. As ML models are increasingly shared via pickle-based formats (.pt, .pth, .pkl), attackers exploit this to distribute malware disguised as models.
PickleGuard detects these threats through deep opcode analysis, catching attacks that bypass existing tools.
Benchmark Results
Evaluated on the PickleBall dataset (84 malicious samples) and 268 benign models from HuggingFace:
| Tool | True Positive Rate | False Positive Rate |
|---|---|---|
| PickleGuard | 96.4% | 0.0% |
| Picklescan | 92.9% | 6.2% |
| ModelScan | 90.5% | N/A |
Installation
pip install pickleguard
Quick Start
# Scan a model file
pickleguard scan model.pt
# Scan directory recursively
pickleguard scan ./models/ -r
# JSON output for CI/CD
pickleguard scan model.pt -f json
# SARIF output for GitHub Code Scanning
pickleguard scan ./models/ -f sarif -o results.sarif
What It Detects
Dangerous Callables (200+)
- Code Execution:
os.system,subprocess.Popen,eval,exec - Import Attacks:
__import__,importlib.import_module - Network Operations:
socket.socket,urllib.request.urlopen - File Operations:
open,shutil.rmtree,os.remove - Deserialization Chains:
pickle.loads,marshal.loads,yaml.load
Obfuscation Techniques
| Technique | Description |
|---|---|
| Nested Module Paths | torch.serialization.os.system |
| INST Opcode Bypass | Evades GLOBAL+REDUCE detection |
| STACK_GLOBAL | Dynamic name resolution |
| BUILD Injection | Setting __reduce__ via state |
| Encoded Payloads | Base64/hex obfuscated strings |
| Unicode Homoglyphs | Lookalike character substitution |
Supported Formats
- Raw pickle (protocol 0-5)
- PyTorch containers (.pt, .pth, .bin)
- NumPy files (.npy, .npz)
- SafeTensors (marked safe)
- ONNX models
Output Example
============================================================
File: malicious_model.pt
Format: pytorch_zip
Risk Assessment:
Level: CRITICAL
Score: 100/100
Confidence: 100%
Obfuscation Detected:
- NESTED_MODULE_PATH
Findings (2):
[CRITICAL] dangerous_callable_nested_module_attack
Dangerous callable: torch.serialization.os.system
Callable: torch.serialization.os.system
Position: 2
[HIGH] obfuscation_nested_module_path
Name contains dangerous segment 'os'
Position: 2
============================================================
Python API
from pickle_scanner import PickleScanner
scanner = PickleScanner()
result = scanner.scan_file("model.pt")
if result.report.risk_level.name == "CRITICAL":
print(f"Threat detected: {result.report.risk_score}/100")
for finding in result.report.findings:
print(f" [{finding.severity}] {finding.message}")
CI/CD Integration
GitHub Actions
- name: Scan ML Models
run: |
pip install pickleguard
pickleguard scan ./models/ -r -f sarif -o results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sarif
Pre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: pickleguard
name: PickleGuard
entry: pickleguard scan
language: system
files: \.(pt|pth|pkl|pickle)$
Custom Rules
Define custom detection rules in YAML:
rules:
- name: "block_custom_module"
severity: critical
description: "Block imports from untrusted module"
conditions:
- opcode: [GLOBAL, INST]
module: "untrusted_module"
pickleguard scan model.pt --rules custom_rules.yaml
How It Works
PickleGuard uses a multi-stage analysis pipeline:
- Format Detection: Identifies file type (pickle, PyTorch ZIP, NumPy, etc.)
- Opcode Parsing: Extracts all pickle opcodes from the stream
- Stack Simulation: Abstract interpretation without code execution
- Threat Analysis: Matches against 200+ dangerous callable patterns
- Obfuscation Detection: Identifies evasion techniques
- Risk Scoring: Multi-factor scoring with context awareness
Risk Levels
| Level | Score | Description |
|---|---|---|
| CRITICAL | 85-100 | Confirmed dangerous callable |
| HIGH | 60-84 | Dangerous pattern or obfuscation |
| MEDIUM | 30-59 | Unknown callable detected |
| LOW | 1-29 | Minor indicators |
| SAFE | 0 | Clean file |
Comparison with Alternatives
| Feature | PickleGuard | Picklescan | ModelScan | Fickling |
|---|---|---|---|---|
| TPR | 96.4% | 92.9% | 90.5% | - |
| FPR | 0.0% | 6.2% | - | - |
| Nested Path Detection | Yes | No | No | No |
| INST Bypass Detection | Yes | No | No | No |
| PyTorch ZIP Support | Yes | Yes | Yes | No |
| Safe Builtin Whitelist | Yes | No | No | No |
| SARIF Output | Yes | No | Yes | No |
CLI Reference
usage: pickleguard scan [-h] [-r] [-f {text,json,sarif}] [-o OUTPUT] [-v]
[--show-safe-patterns] [--rules RULES] path
positional arguments:
path File or directory to scan
options:
-r, --recursive Scan directories recursively
-f, --format Output format (default: text)
-o, --output Write output to file
-v, --verbose Show detailed findings
--show-safe-patterns Include safe ML patterns in output
--rules RULES Custom rules YAML file
Contributing
Contributions welcome. Please ensure:
- New detection rules include test cases
- Changes maintain 0% false positive rate
- Code passes
ruffandmypychecks
# Development setup
pip install -e ".[dev]"
pytest
ruff check .
mypy pickle_scanner/
License
MIT License
Acknowledgments
- PickleBall - Malicious pickle dataset
- Fickling - Fickling research
- ProtectAI - ModelScan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pickleguard-1.0.1.tar.gz.
File metadata
- Download URL: pickleguard-1.0.1.tar.gz
- Upload date:
- Size: 65.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82e35a4c7f14f2dc3bcebd7b461d6ed941639bb514aa3bd0aef00efb167d730d
|
|
| MD5 |
709c9829991f1b63294896cf54fc020e
|
|
| BLAKE2b-256 |
6209a8dfcf774811fbe31e2d0e1dbde7107e077f0ea147a7ece801c1fc4aef0d
|
File details
Details for the file pickleguard-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pickleguard-1.0.1-py3-none-any.whl
- Upload date:
- Size: 59.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df07e04141632b819b9184bb05d8dd6d221c6a383605cf2b3c6fc0d7127f4d8
|
|
| MD5 |
6265725fd1ff5a6c7412999e46e5761a
|
|
| BLAKE2b-256 |
151d97b874d8a0dc5d5550d6d742ee1c546a26b715c7f06448e9c365aead20f1
|