Skip to main content

A high-performance, security-focused static analysis tool for Python, powered by Rust.

Project description

image

High-Performance Python/Rust Graph-Based SAST Framework

Powered By Total PyPI Downloads PyPI Downloads latest release PyPI version Python version Rust version CodeQL Status Trusted By

PySpector is a State-of-the-Art Static Analysis Security Testing (SAST) framework, built in Rust for next-gen performances, made for modern Python projects and large codebases. Unlike traditional linters, PySpector utilizes a Flow-Sensitive, Inter-Procedural Taint Engine to track untrusted data across complex function boundaries and control flow structures.

By compiling the core analysis engine to a native binary, PySpector avoids the performance limitations of traditional Python-only tools. This makes it well-suited for CI/CD pipelines and local development environments where speed and scalability matter.

PySpector is designed to be both comprehensive and intuitive, offering a multi-layered analysis approach that goes beyond simple pattern matching to understand the structure and data flow of your Python application.

Table of Contents

Quick Demo

https://github.com/user-attachments/assets/0fe03961-0b62-4964-83ba-849f2357efba

Getting Started

Prerequisites

  • Python: Python 3.9 – 3.14 supported (Python 3.9 or newer, up to 3.14).
  • Rust: The Rust compiler (rustc) and Cargo package manager are required. You can easily install the Rust toolchain via rustup and verify your installation by running cargo --version.

Installation

It is highly recommended to install PySpector in a dedicated Python 3.14 venv.

Create a Virtual Environment:

  • Linux (Bash):

    # Download Python 3.14
    python3.14 -m venv venv
    source venv/bin/activate
    
  • Windows (PowerShell):

    # Download Python 3.14 from the Microsoft Store and run:
    python3.14 -m venv venv
    .\venv\Scripts\Activate.ps1
    # or, depending on the Python 3.14 installation source:
    .\venv\bin\Activate.ps1
    

With PySpector now officially on PyPI🎉, installation is as simple as running:

pip install pyspector

Key Features

  • Flow-Sensitive Analysis: Utilizes a Control Flow Graph (CFG) to track variable states sequentially, accurately distinguishing between safe and vulnerable code paths.

  • Inter-Procedural Taint Tracking: Propagates untrusted data across function boundaries using global fixed-point iteration and function summaries.

  • Context-Aware Summaries: Sophisticated mapping of which function parameters flow to return values, allowing for high-precision tracking through complex utility functions.

  • Multi-Engine Hybrid Scanning:

    • Regex Engine: High-speed scanning for secrets, hardcoded credentials, and configuration errors.

    • AST Engine: Deep structural pattern matching to find Python-specific anti-patterns.

    • Graph Engine: Advanced CFG and Call-Graph-based data flow analysis for complex vulnerability chains.

  • Fastest Market Performances: Core analysis engine implemented in Rust with Rayon for multi-threaded parallelization (allowing PySpector to scan 71% faster than Bandit, and 16.6x faster than Semgrep).

  • AI-Agent Security: Specialized rulesets designed to identify prompt injection, insecure tool use, and data leakage in LLM-integrated Python applications.

Core Engine Architecture

PySpector v0.1.5 represents a shift from partially-static pattern matching, to a full graph-based analysis engine:

  1. AST Parsing: Python source is converted into a structured JSON AST, for semantic analysis.
  2. Call Graph Construction: PySpector builds a project-wide map of function definitions, and call sites to enable cross-file analysis.
  3. CFG Generation: Each function is decomposed into a Control Flow Graph (CFG), allowing the engine to understand the order of operations and conditional Python logic.
  4. Fixed-Point Taint Propagation: Using a Worklist Algorithm, the engine propagates "taint" from defined Sources to Sinks, while respecting Sanitizers that clean the data along the way.

How It Works

PySpector's hybrid architecture is key to its performance and effectiveness.

  • Python CLI Orchestration: The process begins with the Python-based CLI. It handles command-line arguments, loads the configuration and rules, and prepares the target files for analysis. For each Python file, it uses the native ast module to generate an Abstract Syntax Tree, which is then serialized to JSON.

  • Invocation of the Rust Core: The serialized ASTs, along with the ruleset and configuration, are passed to the compiled Rust core. The handoff from Python to Rust is managed by the pyo3 library.

  • Parallel Analysis in Rust: The Rust engine takes over and performs the heavy lifting. It leverages the rayon crate to execute file scans and analysis in parallel, maximizing the use of available CPU cores. It builds a complete call graph of the application to understand inter-file function calls, which is essential for the taint analysis module.

  • Results and Reporting: Once the analysis is complete, the Rust core returns a structured list of findings to the Python CLI. The Python wrapper then handles the final steps of filtering the results based on the severity threshold and the baseline file, and generating the report in the user-specified format.

This architecture combines the best of both worlds: a flexible, user-friendly interface in Python and a high-performance, memory-safe analysis engine in Rust :)

Performance Benchmarks

Performance benchmarks demonstrate PySpector's competitive advantages in SAST scanning speed while maintaining comprehensive security analysis.

Performance benchmarks were executed in a deterministic and controlled environment using automated stress-testing scripts, ensuring repeatable and unbiased measurements

Benchmark Results

speed_benchmark_charts

Comparative analysis across major Python codebases (Django, Flask, Pandas, Scikit-learn, Requests) shows:

Metric PySpector Bandit Semgrep
Throughput 25,607 lines/sec 14,927 lines/sec 1,538 lines/sec
Performance Advantage 71% faster than Bandit Baseline 16.6x slower
Memory Usage 1.4 GB average 111 MB average 277 MB average
CPU Utilization 120% (multi-core) 100% (single-core) 40%

Key Performance Characteristics

  • Speed: Delivers 71% faster scanning than traditional tools through Rust-powered parallel analysis
  • Scalability: Maintains high throughput on large codebases (500k+ lines of code)
  • Resource Profile: Optimized for modern multi-core environments with adequate memory allocation
  • Consistency: Stable performance across different project types and sizes

System Requirements for Optimal Performance

  • Minimum: 2 CPU cores, 2 GB RAM
  • Recommended: 4+ CPU cores, 4+ GB RAM for large codebases
  • Storage: SSD recommended for large repository scanning

Benchmark Methodology

Performance testing conducted on:

  • Test Environment: Debian-based Linux VM (2 cores, 4GB RAM)
  • Test Projects: 5 major Python repositories (13k-530k lines of code)
  • Measurement: Average of multiple runs with CPU settling periods
  • Comparison: Head-to-head against Bandit and Semgrep using identical configurations

Benchmark data available in the project repository for transparency and reproducibility.

Usage

PySpector is operated through a straightforward command-line interface.

Running a Scan

The primary command is scan, which can target a local file, a directory, or even a remote Git repository.

pyspector scan [PATH or --url REPO_URL] [OPTIONS]

Examples:

  • Scan a single file
pyspector scan /path/to/your/project
  • Scan a local directory and save the report as HTML:
pyspector scan /path/to/your/project -o report.html -f html
  • Scan a public GitHub repository:
pyspector scan --url https://github.com/username/repo.git

Wizard Mode for Beginners (NEW FEATURE🚀)

image
  • Use the --wizard flag to enter the guided scan mode, perfect for 1st time users and beginners or students:
pyspector scan --wizard

Scan for AI and LLM Vulnerabilities

image
  • Use the --ai flag to enable a specialized ruleset, for projects using Large Language Models:
pyspector scan /path/to/your/project --ai

Scan for Supply-Chain CVEs in Dependencies

image
  • Use the --supply-chain flag to check your project dependencies for known CVEs:
pyspector scan /path/to/your/project --supply-chain

Plugin System

image PySpector ships with an extensible plugin architecture that lets you post-process findings, generate custom artefacts, or orchestrate follow-up actions after every scan. Plugins run in-process once the Rust core returns the final issue list, so they see exactly the same normalized data that drives the built-in reports.

Lifecycle Overview

  1. Discovery - Plugin files live in the repository's plugins directory (PySpector/plugins) and are discovered automatically.
  2. Registration - Trusted plugins are recorded in PySpector/plugins/plugin_registry.json together with their checksum and metadata.
  3. Validation - Before execution PySpector validates plugin configuration, statically inspects the source for dangerous APIs, and checks the on-disk checksum.
  4. Execution - The plugin is initialized, receives the full findings list, and can emit additional files or data. cleanup() is always called at the end.

Managing Plugins from the CLI

The CLI exposes helper commands for maintaining your local catalogue:

pyspector plugin list               # Show discovered plugins, trust status, version, author
pyspector plugin trust plugin_name     # Validate, checksum, and mark a plugin as trusted
pyspector plugin info plugin_name     # Display stored metadata and checksum verification
pyspector plugin install path/to/plugin.py --trust
pyspector plugin remove legacy_plugin

Only trusted plugins are executed automatically. When you trust a plugin PySpector calculates its SHA256 checksum and stores the version, author, and description that the plugin declares via PluginMetadata. If the file is modified later you will be warned before it runs again. To trust a plugin:

pyspector plugin install ./PySpector/plugins/aipocgen.py --trust

Running Plugins During a Scan

Use one or more --plugin flags during pyspector scan and provide a JSON configuration file if the plugin expects custom settings:

pyspector scan vulnerableapp.py --plugin aipocgen --plugin-config ./PySpector/pluginconfig/aipocgen.json

The configuration file must be a JSON object whose keys match plugin names, for example:

{
  "aipocgen": {
    "api_key": "YOUR-GROQ-KEY",
    "model": "llama-3.3-70b",
    "severity_filter": ["HIGH", "CRITICAL"],
    "max_pocs": 5,
    "output_dir": "pocs",
    "dry_run": false
  }
}

Each plugin receives only its own configuration block. Results are printed in the CLI, and any paths returned in the output_files list are shown under “Generated files”.

Authoring a Plugin

Create a new Python file in ~/.pyspector/plugins/<name>.py and subclass PySpectorPlugin:

from pathlib import Path
from typing import Any, Dict, List

from pyspector.plugin_system import PySpectorPlugin, PluginMetadata


class MyPlugin(PySpectorPlugin):
    @property
    def metadata(self) -> PluginMetadata:
        return PluginMetadata(
            name="my_plugin",
            version="0.1.0",
            author="Your Name",
            description="Summarises HIGH severity findings",
            category="reporting",
        )

    def validate_config(self, config: Dict[str, Any]) -> tuple[bool, str]:
        if "output_file" not in config:
            return False, "output_file is required"
        return True, ""

    def initialize(self, config: Dict[str, Any]) -> bool:
        self.output = Path(config["output_file"]).resolve()
        return True

    def process_findings(
        self,
        findings: List[Dict[str, Any]],
        scan_path: Path,
        **kwargs,
    ) -> Dict[str, Any]:
        highs = [f for f in findings if f.get("severity") == "HIGH"]
        self.output.write_text(f"{len(highs)} HIGH findings\n", encoding="utf-8")
        return {
            "success": True,
            "message": f"Summarised {len(highs)} HIGH findings",
            "output_files": [str(self.output)],
        }

Your plugin must implement the following:

  • metadata – Return a PluginMetadata instance describing the plugin.
  • validate_config(config) (optional but recommended) – Abort gracefully when required settings are missing by returning (False, "reason").
  • initialize(config) – Prepare state or dependencies; return False to skip execution.
  • **process_findings(findings, scan_path, **kwargs)** – Receive every finding as a dictionary and return a result object containing:
    • success: boolean status
    • message: short summary for the CLI
    • data: optional serializable payload
    • output_files: optional list of generated file paths
  • cleanup() (optional) – Release resources; called even if an exception occurs.

Tip: Plugins are plain Python modules, so you can run python my_plugin.py while developing to perform quick checks before trusting them through the CLI.

Configuration Tips and Best Practices

  • Store API keys or long-lived secrets in environment variables and read them during initialize. Provide helpful error messages when credentials are missing.
  • Keep side-effects inside the scan directory. When PySpector scans a single file scan_path is that file, so the reference plugins switch to scan_path.parent before writing outputs.
  • Validate configuration early using validate_config; PySpector surfaces the error message in the CLI without executing the plugin.
  • Return meaningful message values and populate output_files so automation can pick up generated artifacts.
  • Document optional switches such as dry_run (see the bundled aipocgen plugin for an example) to support air-gapped testing.

Security Model

The plugin manager enforces several safeguards:

  • AST-based static inspection blocks dangerous constructs (eval, exec, subprocess.*, etc.) and prints warnings when sensitive but acceptable calls (e.g., open) are used.
  • Trust workflow – you must explicitly trust a plugin before it can run; the CLI informs you about any warnings produced during validation.
  • Checksum verification – each trusted plugin has a stored SHA256 hash; changes are flagged before execution.
  • Argument isolation – the runner resets sys.argv to a minimal value so Click-based plugins cannot consume the parent CLI arguments accidentally.
  • Structured error handling – exceptions are caught, traced, and reported without aborting the main scan, and cleanup() still runs.

Together these measures let you extend PySpector confidently while maintaining a secure supply chain for third-party automation.

Triaging and Baselining Findings

image

PySpector includes an interactive triage mode to help manage and baseline findings. This allows you to review issues and mark them as "ignored" so they don't appear in future scans.

  • Generate a JSON report:
pyspector scan /path/to/your/project -o report.json -f json
  • Start the triage TUI:
pyspector triage report.json

Inside the TUI, you can navigate with the arrow keys, press i to toggle the "ignored" status of an issue, and s to save your changes to a .pyspector_baseline.json file. This baseline file will be automatically loaded on subsequent scans.

Automation and Integration

PySpector includes Shell helper scripts to integrate security scanning directly into your development and operational workflows.

SARIF Output and Security Tool Integration

PySpector supports exporting scan results in SARIF (Static Analysis Results Interchange Format).
SARIF is a standardized format used by many security platforms and CI/CD systems to aggregate and visualize static analysis findings.

Why SARIF?

Using SARIF allows PySpector results to be easily integrated with:

  • GitHub Code Scanning
  • GitHub Advanced Security
  • Security dashboards and DevSecOps pipelines
  • Other SAST aggregation platforms

Example SARIF Output

Below is a simplified example of a SARIF result generated from a PySpector scan:

{
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "PySpector",
          "informationUri": "https://github.com/ParzivalHack/PySpector"
        }
      },
      "results": [
        {
          "ruleId": "PYSEC001",
          "level": "warning",
          "message": {
            "text": "Potential command injection detected"
          }
        }
      ]
    }
  ]
}

CI/CD Integration

SARIF output can be uploaded to platforms like GitHub Code Scanning to visualize security findings directly in pull requests and repository security dashboards.

Example workflow:

pyspector scan ./project -f sarif -o report.sarif

The generated report.sarif file can then be uploaded to supported security platforms for analysis and visualization.

Git Pre-Commit Hook

To ensure that no new high-severity issues are introduced into the codebase, you can set up a Git pre-commit hook. This hook will automatically scan staged Python files before each commit and block the commit if any HIGH or CRITICAL issues are found.

To set up the hook, run the following script from the root of your Git repository:

./scripts/setup_hooks.sh

This script creates an executable .git/hooks/pre-commit file that performs the check. You can bypass the hook for a specific commit by using the --no-verify flag with your git commit command.

Scheduled Scans with Cron

For continuous monitoring, you can schedule regular scans of your projects using a cron job. PySpector provides an interactive script to help you generate the correct crontab entry.

To generate your cron job command, run:

./scripts/setup_cron.sh

🛡️ Security Hall of Fame

satoridev01
satoridev01

🛡️
Shinigami
Shinigami

🛡️
fg0x0
fg0x0

🛡️

This project follows the all-contributors specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspector-0.1.9.tar.gz (116.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyspector-0.1.9-cp314-cp314t-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp314-cp314-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp313-cp313t-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp313-cp313-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp312-cp312-win_amd64.whl (2.4 MB view details)

Uploaded CPython 3.12Windows x86-64

pyspector-0.1.9-cp312-cp312-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp311-cp311-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp310-cp310-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

pyspector-0.1.9-cp39-cp39-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

File details

Details for the file pyspector-0.1.9.tar.gz.

File metadata

  • Download URL: pyspector-0.1.9.tar.gz
  • Upload date:
  • Size: 116.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pyspector-0.1.9.tar.gz
Algorithm Hash digest
SHA256 9d198e8ceb25d794f7179201f1b63660b86f02895e39a67c91ae86e10678b2e5
MD5 2524e2e19c3ea5b1b386af0a48e8ce35
BLAKE2b-256 24d47d3c985c29aa7e052a917a2f343a1e95e1ac8868cf31eef903230e6d6de1

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp314-cp314t-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp314-cp314t-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ea1f663ad49eb504eb438149d80997a72979902d1b0daa916f98824dd8bfab06
MD5 eea87e27dae8d09e69a008783187e742
BLAKE2b-256 4346176ad8764069514368be0d9c865d29c8894094f70e5391434515cab9cf66

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 22eac51bebaa920245d9e327f2b26827b9a93d3c5852cca1a8364ccb08fdca21
MD5 f28dcb7e51b8c59cbaaa5e4b7ac08511
BLAKE2b-256 eb6165f130100bd4a1b658ef8c816595e232fc8da968fa561bc5804dd7e34fdd

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp313-cp313t-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp313-cp313t-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2985976d060ae022ab4416394c91d4e4a1dd7db9fb53ac7c6c3f6c7fd944d937
MD5 3640b135c045305b18b07e4957b7f6f8
BLAKE2b-256 0453532563172315e53edc56fe20d5c59bfa9ba43900ef44486b5c164d375ad0

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3471c254a9fc636c5fb8e2c170ebbe3710a0b958cfe4affaf8bd9c11e7672d24
MD5 f1fca646077f9768431b136adecb5d7c
BLAKE2b-256 4bee8453a55237803ce6a906278a18c6a4e83570459621012788a64a5f427fa0

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyspector-0.1.9-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pyspector-0.1.9-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 48804d0461c2aa222cca039265410420961f25034a839fece45dc7700ac5ceaf
MD5 e6e6f32c1b89219b758cf09f5bf8af20
BLAKE2b-256 536e8945b5d4002a6ca19d247e1ee6e7f846a48fddd5d5237d9e04af9937a266

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 14785abd9abc4c42910ace4e38bea2ea8d79907350de4af9b020586afb756ec5
MD5 d4754f355f0d34765182917c2d394932
BLAKE2b-256 b4e7ae0a765644783c4624b853d594a91ac1ff114cb89c107d99df595162cb97

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 77880f5886168d9638fe935b3e6e2ea2cecd43736225912420015f407ac2a36a
MD5 d1dcf3ff4198d8972fa4c8d29568257e
BLAKE2b-256 16a9493f78cd99ebb63f0a46a7d66496a7fa8b6087b386ac5cd79f1e325959cb

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 170c79e3ed2e572dca4fcbfb63cfbf6d73961d5406c91476f9467e437ccc4e66
MD5 fb751b0eaa8c010fb305ccce81af6cd7
BLAKE2b-256 7ed10ad2252d9772f1f03581dda03c1c103947e5bf5e1a84fa2aa869febc2156

See more details on using hashes here.

File details

Details for the file pyspector-0.1.9-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyspector-0.1.9-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1b7161a18bfcab19bdce2621a0b30846da28ca2cf613437d70cf5dc1eae186f3
MD5 75f561d77633dce8ba112d110385828b
BLAKE2b-256 2827c7af7a78463d2ae65a7111645d36dbd0929b40f7ae4c417e61a80b6e8b2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page