Skip to main content

Security scanner detecting Python Pickle files performing suspicious actions

Project description

Python Pickle Malware Scanner

PyPI Test

Security scanner detecting Python Pickle files performing suspicious actions.

Getting started

Scan a malicious model on Hugging Face:

pip install picklescan
picklescan --huggingface ykilcher/totally-harmless-model

The scanner reports that the Pickle is calling eval() to execute arbitrary code:

https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin:archive/data.pkl: global import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1

The scanner can also load Pickles from local files, directories, URLs, and zip archives (a-la PyTorch):

picklescan --path downloads/pytorch_model.bin
picklescan --path downloads
picklescan --url https://huggingface.co/sshleifer/tiny-distilbert-base-cased-distilled-squad/resolve/main/pytorch_model.bin

To scan Numpy's .npy files, pip install the numpy package first.

Usage

Exit codes

The scanner exit status codes are (a-la ClamAV):

  • 0: scan did not find malware
  • 1: scan found malware
  • 2: scan failed

Filtering files and directories

When scanning directories, files and subdirectories can be filtered using regular expressions (again modeled after ClamAV). Each option can be specified multiple times:

Option Description
--exclude=REGEX Don't scan files whose path matches the regex
--include=REGEX Only scan files whose path matches the regex
--exclude-dir=REGEX Don't descend into directories whose path matches the regex
--include-dir=REGEX Only descend into directories whose path matches the regex

Key behaviors:

  • Excludes always win over includes. A file or directory matching both an exclude and an include pattern is skipped.
  • Multiple patterns OR together. A file is included if it matches any --include pattern.
  • No includes = everything eligible. Include patterns only narrow the scan when specified.
  • --exclude-dir prunes traversal. The directory and all of its contents are skipped entirely.
# Only scan .pkl files, skip the cache/ subdirectory
picklescan --path models/ --include='\.pkl$' --exclude-dir='cache'

Develop

Create and activate the conda environment (miniconda is sufficient):

conda env create -f conda.yaml
conda activate picklescan

Install the package in editable mode to develop and test:

python3 -m pip install -e .

Edit with VS Code:

code .

Run unit tests:

pytest tests

Run manual tests:

  • Local PyTorch (zip) file
mkdir downloads
wget -O downloads/pytorch_model.bin https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin
picklescan -l DEBUG -p downloads/pytorch_model.bin
  • Remote PyTorch (zip) URL
picklescan -l DEBUG -u https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin

Lint the code:

black src tests --line-length 140
flake8 src tests --count --show-source

Publish the package to PyPI: bump the package version in setup.cfg and create a GitHub release. This triggers the publish workflow.

Alternative manual steps to publish the package:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m build
python3 -m twine upload dist/*

Test the package: bump the version of picklescan in conda.test.yaml and run

conda env remove -n picklescan-test
conda env create -f conda.test.yaml
conda activate picklescan-test
picklescan --huggingface ykilcher/totally-harmless-model

Tested on Linux 5.10.102.1-microsoft-standard-WSL2 x86_64 (WSL2).

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picklescan-1.0.2.tar.gz (29.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picklescan-1.0.2-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file picklescan-1.0.2.tar.gz.

File metadata

  • Download URL: picklescan-1.0.2.tar.gz
  • Upload date:
  • Size: 29.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for picklescan-1.0.2.tar.gz
Algorithm Hash digest
SHA256 84d8a1fa53b6490f3100dfb961921e14a34cbf8493d12435bc3f5f26d9f06f7b
MD5 03f05d8ee806c8776a78451c330615fd
BLAKE2b-256 6f96e9f1730519da874b3c6c3e2017cff295fb92097ce62155d790fc855a0ec8

See more details on using hashes here.

File details

Details for the file picklescan-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: picklescan-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for picklescan-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 20e4c86ce928827a3d05d69ee21d0d6a2c208850ed03afde9cdf41a680f4e54e
MD5 15b6be93f876b52bc548b97016219661
BLAKE2b-256 381fc498e194f5b3e379b9a1ae8c1f876ad8e0b99986128c8c1c6e311420f2b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page