A zero dependency lightweight static analyzer designed for adversarial-shape code in python to detect supply chain attacks before they reach your interpreter.

These details have been verified by PyPI

Project links

Owner

Nuclear Treestump Technologies

GitHub Statistics

Maintainers

0xIkari

These details have not been verified by PyPI

Project description

pydepgate

A zero dependency lightweight static analyzer designed for adversarial-shape code in python to detect supply chain attacks before they reach your interpreter.

pydepgate inspects Python packages and environments for code that executes silently at interpreter startup. This was the attack class used by the March 2026 LiteLLM supply-chain compromise and catalogued as MITRE ATT&CK T1546.018.

Available on PyPI as pydepgate.

The problem

Python's interpreter runs several kinds of code automatically at startup, before any user script executes:

.pth files in site-packages/. Any line beginning with import is passed to exec() by site.py during interpreter initialization.
sitecustomize.py and usercustomize.py. Imported automatically if present.
__init__.py top-level code in any imported package.
setup.py. Executed during pip install for source distributions.
Console-script entry points. Generated and executed by pip install.

Each of these is a legitimate Python feature. Each has been used in real-world supply-chain attacks. Existing Python security tooling (pip-audit, safety, bandit) does not inspect these startup vectors. The .pth vector in particular has been acknowledged as a security gap in CPython issue #113659 but has no patch.

Installation

pip install pydepgate

Requires Python 3.11 or later. No third-party runtime dependencies.

Docker Usage

docker pull ghcr.io/nuclear-treestump/pydepgate:0.4.5

See Docker README.md for usage details and suggested workflows

Quickstart

# Scan a wheel
pydepgate scan some-package-1.0.0-py3-none-any.whl

# Scan an installed package by name
pydepgate scan litellm

# Scan a single file
pydepgate scan --single suspicious_module.py

# Look up what a signal means
pydepgate explain DENS010

Exit code 0 is clean, 2 means at least one HIGH or CRITICAL finding. The full exit code contract is in docs/reference/exit-codes.md.

Usage

Scan an artifact

Wheels, source distributions, and installed packages all use the same positional scan invocation:

pydepgate scan some-package-1.0.0-py3-none-any.whl
pydepgate scan some-package-1.0.0.tar.gz
pydepgate scan litellm

The wheel and sdist paths read directly from disk. The installed-package form is resolved via importlib.metadata against the active environment.

Scan a single file

--single bypasses wheel/sdist/installed-package dispatch and analyzes the file directly. Useful for iterating on test fixtures, ad-hoc inspection of a suspicious file, or reproducing a finding without restructuring the file into a package:

pydepgate scan --single suspicious_module.py
pydepgate scan --single fixture.pth
pydepgate scan --single garbage.py --as init_py

The file kind is auto-detected from the filename. .pth files are treated as pth; files named setup.py, __init__.py, sitecustomize.py, or usercustomize.py are classified as their natural kind; anything else defaults to setup_py (the most permissive context). Override with --as: setup_py / init_py / pth / sitecustomize / usercustomize / library_py.

Scan with payload peek

The peek enricher attempts safe partial decoding of large encoded literals so you can see what's actually inside a flagged blob without ever executing it:

pydepgate scan some-package.whl --peek
pydepgate scan some-package.whl --peek --peek-chain

Peek handles base64, hex, zlib, gzip, bzip2, and lzma chains up to a configurable depth, classifies the terminal payload, and emits ENC002 when the unwrap chain is nested. Pickle data is detected but never deserialized; decompression bombs are bounded by an in-flight byte budget.

Full peek flag reference: docs/cli/index.md.

Scan with recursive decode

--decode-payload-depth=N runs a recursive re-scan over decoded payloads, catching the multi-layer attack shape used by LiteLLM 1.82.8 (a base64 outer payload whose decoded source contains a second base64 payload).

pydepgate scan --deep some-package.whl --peek \
    --decode-payload-depth=3 \
    --decode-iocs=full \
    --decode-location ./forensics

Output goes to a directory chosen with --decode-location (default ./decoded/). With --decode-iocs=full, the run produces an encrypted ZIP archive (default password infected, the malware-research convention) plus a plaintext IOC sidecar for grep-friendly hash extraction.

Full decode pipeline reference, including the IOC mode matrix and end-to-end forensic example: docs/guides/decode-payloads.md.

Scan in CI

--ci forces machine-readable JSON output and disables ANSI color. It does not change --min-severity; combine the two when CI should block only on HIGH and CRITICAL findings:

pydepgate scan --ci --min-severity high some-package.whl

Full CI guide (GitHub Actions, GitLab CI, Docker, pre-commit hooks): docs/guides/ci-integration.md.

Scan with custom rules

pydepgate scan some-package.whl --rules-file company-rules.gate

Auto-discovery checks ./pydepgate.gate then <venv>/pydepgate.gate when the flag is not set. The rule file format is TOML or JSON, auto-detected. Full format spec: docs/reference/rules-file.md.

Look up a signal or rule

pydepgate explain STDLIB001
pydepgate explain DENS010
pydepgate explain --rule default_stdlib001_in_pth
pydepgate explain --list

Scan an entire library archive

pydepgate scan --deep some-package.whl --min-severity high

--deep runs the density analyzer over ordinary library .py files in addition to startup vectors. The density layer produces enough informational signals at library scope that the --min-severity high filter is strongly recommended.

SARIF output

pydepgate scan some-package.whl --format sarif > findings.sarif

Emits a SARIF 2.1.0 document with codeFlows on decoded-payload findings, 24-character partial fingerprints for cross-run deduplication, and content-blind message text (no payload bytes leak into the document). For GitHub Code Scanning integration: docs/guides/sarif-integration.md.

What pydepgate detects

The current analyzer set covers five major classes of suspicious behavior in startup vectors. Each analyzer emits raw signals; the rules engine maps signals to severity-rated findings based on file kind and context.

Encoding abuse (ENC001, ENC002). Patterns where encoded content is decoded and executed in a single chain, for example exec(base64.b64decode(payload)). Catches base64, hex, codec-based, zlib, bz2, lzma, and gzip variants. With --peek enabled, ENC002 fires when the partial-decoder unwrap loop reaches 2+ chain layers or exhausts its configured depth, strong evidence that a literal is intentionally obfuscated rather than a benign encoded blob.

Dynamic execution (DYN001-007). Direct calls to exec, eval, compile, or __import__; access to exec primitives via getattr, globals(), locals(), vars(), or __builtins__ subscripts; compile-then-exec across the file; and aliased call shapes that catch e = exec; e(...) evasions.

String obfuscation (STR001-004). Obfuscated string expressions that resolve to the names of exec primitives or dangerous stdlib functions, computed by a safe partial evaluator that never executes user code. Catches concatenation ('ev' + 'al'), character codes (chr(101) + chr(118) + chr(97) + chr(108)), slicing ('lave'[::-1]), str.join of literal pieces, bytes.fromhex(...).decode(), f-string assembly, and single-assignment variables containing obfuscated values. The "harder they hide it the stronger the signal" model is realized through operation counting.

Suspicious stdlib usage (STDLIB001-003). Calls to stdlib functions that are highly unusual in startup vectors: process spawn (STDLIB001: os.system, subprocess.Popen, subprocess.run, os.exec*), network operations (STDLIB002: urllib.request.urlopen, socket.socket, http.client), and native code loading (STDLIB003: ctypes.CDLL, ctypes.WinDLL). The rules engine promotes these to CRITICAL when they appear in setup.py or .pth files.

Code density (DENS001-051). A broad layer covering the things obfuscated code looks like even when no single primitive call is suspicious on its own: high-entropy string literals, base64-alphabet strings, machine-generated identifiers, confusable single-character names, invisible Unicode characters, Unicode homoglyphs in identifiers, disproportionate AST depth, deeply nested lambdas, byte-range integer arrays, high-entropy docstrings, and dynamic __doc__ references passed to a callable. Calibrated so the same content scans differently depending on file kind: a high-entropy base64 literal in .pth is CRITICAL, in __init__.py is MEDIUM, anywhere else is LOW.

Complete signal reference with severity tables per file kind: docs/reference/signals.md.

Layered detection in practice

The LiteLLM 1.82.8 .pth payload is a single line:

import base64; exec(base64.b64decode('cHJpbnQoMSkK'))

A scanner that grepped for exec would catch it. A scanner that grepped for base64.b64decode would catch it. But an attacker who knew about either of those evasions could trivially defeat both. pydepgate fires five separate findings on this line from four independent analyzers:

ENC001 (encoding_abuse): decode-then-execute pattern
DYN002 (dynamic_execution): exec() with non-literal argument at module scope
DENS001 (code_density): token-dense single line
DENS010 (code_density): high-entropy string literal
DENS011 (code_density): base64-alphabet string literal

Plus the rule layer promotes all of them to CRITICAL because the file is a .pth. To evade pydepgate, an attacker has to defeat every analyzer simultaneously while still producing a working .pth payload. Each evasion narrows what's possible; the intersection of all evasions is the empty set for any shape that could realistically execute on Python startup.

The rules engine

Analyzers emit raw signals. The rules engine maps signals to severity-rated findings using a data-driven rule set. Default rules are built into pydepgate; users can override or augment them with a pydepgate.gate file (TOML or JSON, auto-detected) in the project root, the venv root, or specified via --rules-file.

A rule has three parts: identity, match conditions, and an effect:

[[rule]]
id = "litellm-pth-stdlib"
signal_id = "STDLIB001"
file_kind = "pth"
action = "set_severity"
severity = "critical"
explain = "subprocess calls in .pth files have no legitimate use case."

Three actions are supported: set_severity, suppress, and set_description. User rules always take precedence over default rules, regardless of specificity. Suppressed findings are tracked separately so users can see what would have fired and why it didn't.

Run pydepgate explain --list to see all default rules and signals with descriptions. Complete rules file specification: docs/reference/rules-file.md. Worked walkthroughs of common rule-writing tasks: docs/guides/custom-rules.md.

Status

Static analysis is functional end-to-end. pydepgate can statically analyze wheels, sdists, installed packages, and single loose files for the patterns used in real-world Python supply-chain attacks.

Stable today:

Static analysis of .whl files, sdists, installed packages by name, and individual loose files via --single.
Five production analyzers (encoding_abuse, dynamic_execution, string_ops, suspicious_stdlib, code_density).
A rules engine with file-kind-aware severity promotion, fully data-driven via TOML or JSON pydepgate.gate files. The default rule set includes 32 rules dedicated to density-layer signals alone.
A safe partial evaluator that resolves obfuscated string expressions without executing user code.
An optional payload-peek enricher (--peek) with bounded multi-layer decode, content classification, decompression-bomb budgets, and pickle detection without deserialization.
A recursive decode pipeline (--decode-payload-depth=N) that re-scans decoded payloads and produces a tree-shaped attack-chain report, with optional encrypted-archive IOC output.
An SSH-randomart-style finding-distribution map rendered inline with human-readable scan output.
Three output formats: human-readable terminal, JSON (schema v2), and SARIF 2.1.0 with codeFlow encoding, GitHub-compatible severity mapping, partial fingerprints, and content-blind message text. Validated in CI against the Microsoft SARIF Multitool.
Pre-commit hook integration via pydepgate and pydepgate-pth hook IDs.
Official Docker image at ghcr.io/nuclear-treestump/pydepgate. Multi-stage Alpine build under 50 MB, runs as non-root (uid 1000), published for linux/amd64 and linux/arm64.
Shell tab completion for bash, zsh, and fish.

In active development:

The comment_analysis analyzer.
Runtime interdiction (exec mode).
Environment auditing (preflight mode).
Aliased import resolution (from subprocess import Popen as P).
A pip-wrapper / transitive-dependency audit subcommand.

Documentation

Section	Contents
Getting Started	First scan, reading output, using `explain`
CLI Reference	All subcommands, all flags, environment variables
Signals Reference	Every signal ID with severity tables per file kind
Rules File	`pydepgate.gate` format specification
Exit Codes	Exit code contract and CI implications
Output Formats	Human, JSON, SARIF schemas
Guide: CI Integration	GitHub Actions, GitLab CI, pre-commit, Docker
Guide: Custom Rules	Suppressing false positives, scoping rules
Guide: Decode Payloads	Recursive decode, IOC sidecars, encrypted archives
Guide: SARIF Integration	GitHub Code Scanning ingestion

Design constraints

Zero runtime dependencies. Standard library only. This is a load-bearing design constraint, not a stylistic preference: every additional dependency is a supply-chain attack surface for a tool whose job is to defend against supply-chain attacks.
Safe by construction. Parsers and the partial evaluator never execute, compile, or import input content. Every operation modeled by the resolver is reimplemented from scratch using only Python builtins on values the resolver itself produced.
Self-integrity at bootstrap. Critical stdlib references are captured into locals before any untrusted code runs (relevant when the runtime engine ships in v0.4).
Lightweight. The full test suite runs in roughly twenty seconds on the lowest available options in Codespaces, including subprocess-based CLI tests against installed packages.

Relationship to PyDepGuard

pydepgate is a narrow, single-purpose tool focused on startup-vector interdiction. PyDepGuard is a broader Python security framework covering runtime sandboxing and dependency management. The startup-vector engine developed in pydepgate is intended to eventually integrate with PyDepGuard as a subsystem; until then, the two projects are developed independently.

Users who need only startup-vector protection should use pydepgate. Users who need the full runtime security model should use PyDepGuard directly.

Architecture

The codebase is organized as a layered pipeline:

parsers/           bytes -> structured representations (pth, pysource, wheel, sdist)
introspection/     installed package enumeration via importlib.metadata
traffic_control/   path-based triage; decides what to analyze
analyzers/         structured representations -> raw signals
  _resolver.py        safe partial evaluator (shared infrastructure)
  _visitor.py         scope tracking and AST utilities (shared)
  encoding_abuse      ENC001
  dynamic_execution   DYN001-007
  string_ops          STR001-004
  suspicious_stdlib   STDLIB001-003
  density_analyzer    DENS001-051
enrichers/         signal hints -> enriched signal context
  payload_peek        ENC002 emission and decoded context block
rules/             signals + context -> severity-rated findings
engines/           orchestration (currently: static)
visualizers/       inline rendering helpers for the human reporter
cli/               argparse, dispatch, reporters, explain subcommand

Analyzers do not see raw bytes. They walk parsed representations and emit Signal objects. The rules engine wraps signals with severity to produce Finding objects, applying user and default rules in priority order. The CLI renders findings in human, JSON, or SARIF format.

The _resolver.py module is reusable infrastructure for any analyzer that needs to know what an expression evaluates to. It returns structured ResolutionResult objects with success/failure status, operation counts, partial values, and resolved fragment lists.

The static engine exposes three entry points for single-file analysis. scan_file(path) reads bytes and routes through triage by filename. scan_bytes(content, internal_path, ...) is the per-file workhorse that artifact enumerators (wheel, sdist, installed) call once per in-scope file. scan_loose_file_as(path, file_kind) bypasses triage entirely and forces a file kind, preserving the real path through to finding contexts; this is the entry point used by pydepgate scan --single.

Development

git clone https://github.com/nuclear-treestump/pydepgate
cd pydepgate
pip install -e .
python -m unittest discover tests -v

The test suite has grown to approximately 500 tests as the analyzer set has expanded. Tests are organized by module and include happy-path coverage, evasion batteries, false-positive batteries, robustness checks against adversarial inputs, integration tests against synthetic wheels and sdists, and CLI tests via subprocess.

To regenerate the binary .pth test fixtures after editing them:

python scripts/generate_fixtures.py

Contributors: see CONTRIBUTING.md for the issue process, sign-off requirements, and contribution scope.

Safety notes

This project builds tooling to defend against Python supply-chain attacks. The test fixtures in tests/fixtures/ and the synthetic samples used in integration tests model the structural shape of known attacks (LiteLLM 1.82.8, Trojan Source CVE-2021-42574, others catalogued under T1546.018) but contain only inert payloads. No actual malicious code is present in this repository.

For regression testing against real malicious samples, use the OSSF malicious-packages, Datadog malicious-software-packages-dataset, or lxyeternal/pypi_malregistry datasets. Do so in disposable VMs or containers, and do not commit samples to this repository.

Known limitations

pydepgate's static analysis is honest about what it can and cannot catch. Documented gaps include:

Analysis gaps:

Function return tracking. code = make_payload() where make_payload() internally calls compile(...) is not flagged.
__builtins__ as a Name subscript (rather than via a function call).
Tuple unpacking, augmented assignment, and conditional assignments in the resolver's variable tracking.
Lambda scope precision (lambdas count as their enclosing scope).
Aliased stdlib imports such as from subprocess import Popen as P.

Density-layer caveats:

DENS020 (low-vowel-ratio identifiers) and DENS040 (AST depth) both produce false positives on legitimate machine-generated code (Cython output, parser tables, generated configuration). They ship at LOW severity outside startup vectors so they surface as contributing signals rather than standalone alerts.
DENS031 (homoglyphs) can fire on legitimate non-English variable names in non-Latin codebases. The default rule keeps it at HIGH rather than CRITICAL outside startup vectors so users with intentional non-Latin naming can suppress with a single user rule.

Author

Built by Ikari (@0xIkari) - Python and security engineering. Available for security engineering roles.

LinkedIn: zmillersecengineer
Email: ikari@nuclear-treestump.com

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

Owner

Nuclear Treestump Technologies

GitHub Statistics

Maintainers

0xIkari

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

Jun 20, 2026

0.6.0

Jun 14, 2026

0.5.0

May 30, 2026

This version

0.4.7

May 24, 2026

0.4.6

May 22, 2026

0.4.5

May 17, 2026

0.4.2

May 15, 2026

0.4.1

May 12, 2026

0.4.0

May 10, 2026

0.3.3

May 5, 2026

0.3.2

May 2, 2026

0.3.1

Apr 30, 2026

0.3.0

Apr 30, 2026

0.2.3

Apr 29, 2026

0.2.2

Apr 29, 2026

0.2.1

Apr 29, 2026

0.2.0

Apr 28, 2026

0.1.7

Apr 28, 2026

0.1.6

Apr 28, 2026

0.1.5

Apr 28, 2026

0.1.4

Apr 27, 2026

0.1.3

Apr 27, 2026

0.1.2

Apr 26, 2026

0.1.1

Apr 26, 2026

0.1.0

Apr 26, 2026

0.0.4

Apr 25, 2026

0.0.1

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydepgate-0.4.7.tar.gz (300.1 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydepgate-0.4.7-py3-none-any.whl (342.2 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file pydepgate-0.4.7.tar.gz.

File metadata

Download URL: pydepgate-0.4.7.tar.gz
Upload date: May 24, 2026
Size: 300.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pydepgate-0.4.7.tar.gz
Algorithm	Hash digest
SHA256	`db098560a6b73a0b732cd4af8f262489c779aec0fa01ab0a9a9c8fbfbed520a1`
MD5	`a55651a6dec2d110e6b895b6e66c4f57`
BLAKE2b-256	`bf1ce3283ffc588a751cd36d98ac41392dc37e7ec113ee3a5b8600d48d88f1ca`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydepgate-0.4.7.tar.gz:

Publisher: python-publish.yml on nuclear-treestump/pydepgate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pydepgate-0.4.7.tar.gz
- Subject digest: db098560a6b73a0b732cd4af8f262489c779aec0fa01ab0a9a9c8fbfbed520a1
- Sigstore transparency entry: 1625027837
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: nuclear-treestump/pydepgate@6353712c3206fb0d61112a6da184064bdbf32822
- Branch / Tag: refs/tags/v0.4.7
- Owner: https://github.com/nuclear-treestump
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@6353712c3206fb0d61112a6da184064bdbf32822
- Trigger Event: release

File details

Details for the file pydepgate-0.4.7-py3-none-any.whl.

File metadata

Download URL: pydepgate-0.4.7-py3-none-any.whl
Upload date: May 24, 2026
Size: 342.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pydepgate-0.4.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45a9278d16b048b0859431dfc984f8810146e911f3fa2bf21fb4cef48becdbb2`
MD5	`048297f5bbd927493e8ae78879f206fc`
BLAKE2b-256	`290f720ff82e0e60140a57db30749816560aa1a9a24a531464b698608fc87645`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydepgate-0.4.7-py3-none-any.whl:

Publisher: python-publish.yml on nuclear-treestump/pydepgate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pydepgate-0.4.7-py3-none-any.whl
- Subject digest: 45a9278d16b048b0859431dfc984f8810146e911f3fa2bf21fb4cef48becdbb2
- Sigstore transparency entry: 1625027869
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: nuclear-treestump/pydepgate@6353712c3206fb0d61112a6da184064bdbf32822
- Branch / Tag: refs/tags/v0.4.7
- Owner: https://github.com/nuclear-treestump
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@6353712c3206fb0d61112a6da184064bdbf32822
- Trigger Event: release

pydepgate 0.4.7

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pydepgate

The problem

Installation

Docker Usage

Quickstart

Usage

Scan an artifact

Scan a single file

Scan with payload peek

Scan with recursive decode

Scan in CI

Scan with custom rules

Look up a signal or rule

Scan an entire library archive

SARIF output

What pydepgate detects

Layered detection in practice

The rules engine

Status

Documentation

Design constraints

Relationship to PyDepGuard

Architecture

Development

Safety notes

Known limitations

Author

License

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance