Autonomous Offensive Security Intelligence — AI-powered penetration testing

These details have not been verified by PyPI

Project links

Project description

PHANTOM

Autonomous Adversary Simulation Platform

"Why so Serious!" — Phantom doesn't ask. It finds.

AI-driven penetration testing that reasons, adapts, and verifies — like a human red-teamer.

Quick Start · Architecture · Usage · Contributing

Overview

Phantom is an autonomous AI penetration testing agent. It uses large language models to discover and verify real vulnerabilities in web applications, APIs, and network services — with zero human intervention.

Unlike signature-based scanners, Phantom reasons about targets: it reads responses, identifies attack surfaces, chains multi-step exploits, and produces working proof-of-concept code for every confirmed finding.

	Traditional Scanners	Phantom
Approach	Signature matching	LLM-guided reasoning
False Positives	40–70% typical	Every finding verified with working PoC
Depth	Single-pass	Multi-phase with chained exploits
Reports	Generic dumps	MITRE ATT&CK mapped, compliance-ready
Triage	Manual review	Actionable findings + remediation

Core Capabilities

Autonomous Operation — AI agents that plan, execute, and adapt penetration tests through a ReAct (Reasoning + Acting) loop. No hand-holding required.

30+ Security Tools — nmap, nuclei, sqlmap, ffuf, httpx, katana, semgrep, nikto, gobuster, arjun, Playwright browser, and more — all running inside an isolated Docker sandbox.

Sandboxed Execution — All offensive tools run in ephemeral Docker containers. No host filesystem access, restricted capabilities, automatic cleanup.

Multi-Agent System — Specialized sub-agents for parallel recon, exploitation, and validation. The root agent coordinates; sub-agents specialize.

7-Layer Security — Scope validator, tool firewall, Docker sandbox, cost controller, time limits, HMAC audit trail, output sanitizer.

Verified Findings — Every vulnerability includes a working PoC exploit, raw evidence, and reproducible steps. No guessing.

MITRE ATT&CK Enrichment — Automatic CWE, CAPEC, and technique-level tagging with CVSS 3.1 scoring.

Compliance Mapping — OWASP Top 10 (2021), PCI DSS v4.0, NIST 800-53 — mapped automatically per finding.

Knowledge Persistence — Cross-scan memory stores hosts, vulnerabilities, and false positive signatures. The agent learns from past scans.

Cost Tracking — Per-request and per-scan budget limits. Every token counted, every dollar tracked.

Architecture

High-Level Architecture

%%{init: {'theme': 'dark'}}%%
flowchart TB
    subgraph Interface["Interface Layer"]
        CLI["CLI / TUI"]
        Stream["Streaming Parser"]
    end

    subgraph Orchestration["Orchestration"]
        Profile["Scan Profiles"]
        Scope["Scope Validator"]
        Cost["Cost Controller"]
        Audit["Audit Logger"]
    end

    subgraph Agent["Agent Core"]
        ReAct["BaseAgent · ReAct Loop"]
        State["State Machine"]
        LLM["LLM Client · LiteLLM"]
        Memory["Memory Compressor"]
        Skills["Skills Engine"]
    end

    subgraph Security["Security Layer"]
        Firewall["Tool Firewall"]
        Verifier["Verification Engine"]
    end

    subgraph Execution["Docker Sandbox"]
        ToolServer["Tool Server · HTTP API"]
        Tools["30+ Security Tools"]
        Browser["Playwright · Chromium"]
    end

    subgraph Output["Output"]
        Reports["JSON / HTML / MD"]
        Graph["Attack Graph"]
        MITRE["MITRE ATT&CK"]
        Compliance["Compliance"]
    end

    Interface --> Agent
    Orchestration -.-> Agent
    Agent --> Security
    Security --> Execution
    Agent --> Output

    style Interface fill:#6c5ce7,stroke:#a29bfe,color:#fff
    style Orchestration fill:#00b894,stroke:#55efc4,color:#fff
    style Agent fill:#e17055,stroke:#fab1a0,color:#fff
    style Security fill:#d63031,stroke:#ff7675,color:#fff
    style Execution fill:#0984e3,stroke:#74b9ff,color:#fff
    style Output fill:#fdcb6e,stroke:#ffeaa7,color:#2d3436

Scan Execution Flow

%%{init: {'theme': 'dark'}}%%
sequenceDiagram
    participant User
    participant CLI as Phantom CLI
    participant Agent as Agent (ReAct)
    participant Firewall as Tool Firewall
    participant Sandbox as Docker Sandbox
    participant LLM as LLM Provider
    participant Target

    User->>CLI: phantom scan --target app.com
    CLI->>Sandbox: Create ephemeral container
    CLI->>Agent: Initialize with scope + profile

    rect rgba(108, 92, 231, 0.15)
        Note over Agent,LLM: Reconnaissance
        Agent->>LLM: Analyze target, plan strategy
        LLM-->>Agent: Use nmap, httpx, nuclei
        Agent->>Firewall: Validate tool call
        Firewall-->>Agent: Approved
        Agent->>Sandbox: Execute nmap -sV target
        Sandbox->>Target: Probes
        Target-->>Sandbox: Open ports & services
        Sandbox-->>Agent: Results
    end

    rect rgba(214, 48, 49, 0.15)
        Note over Agent,LLM: Exploitation
        Agent->>LLM: Plan attacks from findings
        LLM-->>Agent: SQLi on /api, XSS on /search
        Agent->>Sandbox: Execute sqlmap
        Sandbox->>Target: Injection payloads
        Target-->>Sandbox: Vulnerability confirmed
        Sandbox-->>Agent: Finding + evidence
    end

    rect rgba(0, 184, 148, 0.15)
        Note over Agent,LLM: Verification
        Agent->>Sandbox: Re-exploit with clean PoC
        Sandbox->>Target: Reproduce attack
        Target-->>Sandbox: Confirmed
    end

    Agent->>CLI: Reports (JSON/HTML/MD)
    CLI->>User: Findings + PoCs + compliance
    CLI->>Sandbox: Destroy container

Agent Decision Loop (ReAct)

%%{init: {'theme': 'dark'}}%%
stateDiagram-v2
    [*] --> Observe: Scan initialized
    Observe --> Reason: Tool results received
    Reason --> Plan: LLM analyzes context
    Plan --> Act: Select tool + arguments
    Act --> Validate: Tool Firewall check

    Validate --> Execute: Approved
    Validate --> Reason: Blocked — re-plan

    Execute --> Record: Results returned
    Record --> CheckStop: Update state

    CheckStop --> Observe: Continue
    CheckStop --> Finalize: Stop condition met

    Finalize --> Verify: Re-test critical findings
    Verify --> Enrich: MITRE + compliance
    Enrich --> Report: Generate reports
    Report --> [*]: Scan complete

Sandbox Architecture

%%{init: {'theme': 'dark'}}%%
graph TB
    subgraph Host["Host Machine"]
        CLI["Phantom CLI"]
        Docker["Docker Engine"]
    end

    subgraph Container["Ephemeral Sandbox · Kali-based · ~13GB"]
        ToolServer["Tool Server API :48081"]

        subgraph Toolkit["Offensive Tools"]
            nmap & nuclei & sqlmap & ffuf
            httpx & katana & semgrep & nikto
            gobuster & arjun & more["20+ more"]
        end

        subgraph Runtime["Runtime"]
            Python["Python 3.12"]
            PW["Playwright + Chromium"]
            Caido["Caido Proxy"]
        end
    end

    CLI -->|"Authenticated HTTP"| ToolServer
    ToolServer --> Toolkit
    ToolServer --> Runtime
    Container -.->|"Network"| Target["Target System"]

    style Host fill:#2d3436,stroke:#636e72,color:#dfe6e9
    style Container fill:#0984e3,stroke:#74b9ff,color:#fff
    style Toolkit fill:#d63031,stroke:#ff7675,color:#fff
    style Runtime fill:#6c5ce7,stroke:#a29bfe,color:#fff

7-Layer Security Model

Layer 1  Scope Validator       Target allowlist, SSRF protection, DNS pinning
Layer 2  Tool Firewall         Argument validation, shell injection blocking
Layer 3  Docker Sandbox        Ephemeral container, restricted capabilities
Layer 4  Cost Controller       Per-request ceiling ($5), scan budget ($25)
Layer 5  Time Limits           Per-tool timeout, global scan timeout
Layer 6  HMAC Audit Trail      Tamper-evident, append-only event log
Layer 7  Output Sanitizer      PII stripping, credential redaction

Quick Start

Requirements: Docker (running) · Python 3.12+ · An LLM API key

# Install
pip install phantom-agent
# or: pipx install phantom-agent

# Configure
export PHANTOM_LLM="openai/gpt-4o"
export LLM_API_KEY="your-api-key"

# Run your first scan
phantom scan --target https://your-app.com

First run pulls the sandbox image (~13GB). Results saved to phantom_runs/.

Docker

docker run --rm -it \
  -e PHANTOM_LLM="openai/gpt-4o" \
  -e LLM_API_KEY="your-key" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/usta0x001/phantom:latest \
  scan --target https://your-app.com

Usage

# Quick scan (~15 min)
phantom scan --target https://app.com --scan-mode quick

# Standard scan (~45 min)
phantom scan --target https://app.com

# Deep scan (1–3 hours, exhaustive)
phantom scan --target https://app.com --scan-mode deep

# Stealth (low-noise, IDS/WAF evasion)
phantom scan --target https://app.com --scan-mode stealth

# API-only (no browser)
phantom scan --target https://api.app.com --scan-mode api_only

# With custom instructions
phantom scan --target https://app.com \
  --instruction "Focus on SQL injection in /api/v2 endpoints"

# Resume interrupted scan
phantom scan --target https://app.com --resume

# Interactive TUI
phantom --target https://app.com

# Non-interactive (CI/CD)
phantom scan --target https://app.com --non-interactive

Scan Profiles

Profile	Iterations	Duration	Best For
`quick`	60	~15 min	CI/CD, rapid checks
`standard`	120	~45 min	Regular testing
`deep`	300	1–3 hours	Comprehensive audit
`stealth`	60	~30 min	Production, IDS evasion
`api_only`	100	~45 min	REST/GraphQL APIs

Post-Scan Pipeline

Every scan runs a 7-stage enrichment pipeline automatically:

1. MITRE ATT&CK     CWE, CAPEC, OWASP mapping
2. Compliance        OWASP Top 10, PCI DSS v4, NIST 800-53
3. Attack Graph      NetworkX path analysis
4. Nuclei Templates  Auto-generated YAML for regression
5. Knowledge Store   Persistent memory updated
6. Notifications     Webhook/Slack for critical findings
7. Reports           JSON, HTML, Markdown output

Configuration

Environment Variables

Variable	Description	Default
`PHANTOM_LLM`	LLM model identifier	`openai/gpt-4o`
`LLM_API_KEY`	API key (comma-separated for rotation)	—
`PHANTOM_REASONING_EFFORT`	`low` / `medium` / `high`	`high`
`PHANTOM_SCAN_MODE`	Default scan profile	`standard`
`PHANTOM_IMAGE`	Sandbox Docker image	`ghcr.io/usta0x001/phantom-sandbox:latest`
`PHANTOM_MAX_COST`	Max cost per scan (USD)	`25.0`
`PHANTOM_PER_REQUEST_CEILING`	Max cost per LLM request	`5.0`
`PHANTOM_WEBHOOK_URL`	Webhook for critical findings	—
`PHANTOM_DISABLE_BROWSER`	Disable Playwright	`false`

Supported LLM Providers

Phantom uses LiteLLM — any of 100+ providers work:

Provider	Model Example	Notes
OpenAI	`openai/gpt-4o`	Best overall
Anthropic	`anthropic/claude-sonnet-4-20250514`	Strong reasoning
Google	`gemini/gemini-2.5-pro`	Large context
Groq	`groq/llama-3.3-70b-versatile`	Free tier
DeepSeek	`deepseek/deepseek-chat`	Cost-effective
OpenRouter	`openrouter/deepseek/deepseek-v3.2`	Multi-provider
Ollama	`ollama/llama3.1`	Local, no API key
Azure	`azure/gpt-4o`	Enterprise

CI/CD Integration

GitHub Actions

name: Security Scan
on:
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install phantom-agent
      - run: phantom scan --target ./ --non-interactive --scan-mode quick
        env:
          PHANTOM_LLM: ${{ secrets.PHANTOM_LLM }}
          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

Development

git clone https://github.com/Usta0x001/Phantom.git
cd Phantom
python -m venv .venv && source .venv/bin/activate  # .venv\Scripts\activate on Windows
pip install -e ".[dev]"
pytest tests/ -v

Project Structure

phantom/
├── phantom/                  # Core package
│   ├── agents/               #   Agent system (ReAct loop, state, delegation)
│   ├── core/                 #   Security, reporting, knowledge (20+ modules)
│   ├── tools/                #   30+ security tool wrappers
│   ├── llm/                  #   LLM client, memory compression
│   ├── runtime/              #   Docker sandbox management
│   ├── interface/            #   CLI, TUI, streaming
│   ├── models/               #   Pydantic domain models
│   ├── skills/               #   Domain knowledge files
│   └── telemetry/            #   Run tracing
├── tests/                    # 731+ tests
├── containers/               # Sandbox Dockerfile
├── scripts/                  # Build scripts
└── docs/                     # Documentation

Testing

731+ tests, 0 failures:

Suite	Tests	Scope
Integration	153	Full system E2E
Audit fixes	39	Security fix verification
Unit tests	~200	Module-level
Feature tests	~100	Regression
Coverage tests	~80	Gap coverage
Security tests	~50	Security-specific

Security Audit

Two deep offensive audits on the codebase — all findings resolved:

Severity	Found	Fixed
Critical	8	8
High	19	19
Medium	34	34
Low	22	22
Total	83	83

Documentation

Architecture — System design
Documentation — Full reference
Quick Start — Get scanning in 2 minutes
Contributing — Development guidelines

Contributing

Contributions welcome. See CONTRIBUTING.md.

Bugs — Open an issue
Features — Start a discussion
PRs — Fork, branch, test, submit

License

Apache License 2.0 — see LICENSE.

Acknowledgements

Built on: LiteLLM · Nuclei · Playwright · Textual · Rich · NetworkX · SQLMap

PHANTOM — "Why so Serious!"

Made by Usta0x001

WARNING: Only test systems you own or have explicit written authorization to test. Unauthorized access to computer systems is illegal.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.218

Apr 28, 2026

0.9.217

Apr 28, 2026

0.9.216

Apr 27, 2026

0.9.215

Apr 27, 2026

0.9.214

Apr 27, 2026

0.9.213

Apr 27, 2026

0.9.212

Apr 26, 2026

0.9.211

Apr 26, 2026

0.9.210

Apr 26, 2026

0.9.209

Apr 26, 2026

0.9.208

Apr 26, 2026

0.9.207

Apr 26, 2026

0.9.206

Apr 23, 2026

0.9.205

Apr 23, 2026

0.9.204

Apr 23, 2026

0.9.203

Apr 22, 2026

0.9.202

Apr 22, 2026

0.9.201

Apr 22, 2026

0.9.200

Apr 22, 2026

0.9.199

Apr 22, 2026

0.9.198

Apr 22, 2026

0.9.197

Apr 22, 2026

0.9.196

Apr 22, 2026

0.9.195

Apr 22, 2026

0.9.194

Apr 22, 2026

0.9.193

Apr 22, 2026

0.9.192

Apr 22, 2026

0.9.191

Apr 22, 2026

0.9.190

Apr 21, 2026

0.9.189

Apr 21, 2026

0.9.188

Apr 20, 2026

0.9.187

Apr 20, 2026

0.9.186

Apr 20, 2026

0.9.185

Apr 20, 2026

0.9.184

Apr 18, 2026

0.9.183

Apr 17, 2026

0.9.182

Apr 17, 2026

0.9.181

Apr 17, 2026

0.9.180

Apr 17, 2026

0.9.178

Apr 17, 2026

0.9.177

Apr 17, 2026

0.9.176

Apr 17, 2026

0.9.175

Apr 17, 2026

0.9.174

Apr 17, 2026

0.9.173

Apr 17, 2026

0.9.172

Apr 16, 2026

0.9.171

Apr 16, 2026

0.9.170

Apr 16, 2026

0.9.169

Apr 16, 2026

0.9.168

Apr 16, 2026

0.9.167

Apr 16, 2026

0.9.166

Apr 16, 2026

0.9.165

Apr 16, 2026

0.9.164

Apr 15, 2026

0.9.163

Apr 15, 2026

0.9.162

Apr 15, 2026

0.9.161

Apr 15, 2026

0.9.160

Apr 15, 2026

0.9.159

Apr 15, 2026

0.9.158

Apr 15, 2026

0.9.157

Apr 12, 2026

0.9.156

Apr 8, 2026

0.9.155

Apr 7, 2026

0.9.154

Apr 7, 2026

0.9.153

Apr 7, 2026

0.9.152

Apr 7, 2026

0.9.151

Apr 7, 2026

0.9.150

Apr 7, 2026

0.9.149

Apr 7, 2026

0.9.148

Apr 7, 2026

0.9.147

Apr 6, 2026

0.9.146

Apr 6, 2026

0.9.145

Apr 6, 2026

0.9.144

Apr 6, 2026

0.9.143

Apr 6, 2026

0.9.142

Apr 6, 2026

0.9.141

Apr 6, 2026

0.9.140

Apr 6, 2026

0.9.139

Apr 6, 2026

0.9.138

Apr 5, 2026

0.9.137

Apr 5, 2026

0.9.136

Apr 4, 2026

0.9.135

Apr 4, 2026

0.9.131

Apr 4, 2026

0.9.130

Apr 3, 2026

0.9.126

Apr 3, 2026

0.9.125

Apr 3, 2026

0.9.124

Apr 3, 2026

0.9.123

Apr 2, 2026

0.9.122

Mar 24, 2026

0.9.121

Mar 23, 2026

0.9.120

Mar 23, 2026

0.9.119

Mar 23, 2026

0.9.118

Mar 23, 2026

0.9.117

Mar 23, 2026

0.9.113

Mar 22, 2026

0.9.112

Mar 22, 2026

0.9.111

Mar 21, 2026

0.9.110

Mar 21, 2026

0.9.109

Mar 21, 2026

0.9.108

Mar 21, 2026

0.9.106

Mar 21, 2026

0.9.105

Mar 21, 2026

0.9.104

Mar 21, 2026

0.9.103

Mar 21, 2026

0.9.102

Mar 20, 2026

0.9.101

Mar 20, 2026

0.9.100

Mar 19, 2026

0.9.99

Mar 19, 2026

0.9.97

Mar 19, 2026

0.9.96

Mar 19, 2026

0.9.95

Mar 19, 2026

0.9.94

Mar 18, 2026

0.9.93

Mar 17, 2026

0.9.92

Mar 17, 2026

0.9.91

Mar 15, 2026

0.9.90

Mar 15, 2026

0.9.89

Mar 14, 2026

0.9.88

Mar 14, 2026

0.9.87

Mar 14, 2026

0.9.86

Mar 14, 2026

0.9.85

Mar 14, 2026

0.9.84

Mar 14, 2026

0.9.83

Mar 13, 2026

0.9.82

Mar 13, 2026

0.9.81

Mar 13, 2026

0.9.80

Mar 13, 2026

0.9.77

Mar 12, 2026

0.9.76

Mar 11, 2026

0.9.75

Mar 11, 2026

0.9.74

Mar 11, 2026

0.9.73

Mar 11, 2026

0.9.72

Mar 11, 2026

0.9.71

Mar 11, 2026

0.9.70

Mar 11, 2026

0.9.69

Mar 11, 2026

0.9.68

Mar 11, 2026

0.9.67

Mar 11, 2026

0.9.66

Mar 11, 2026

0.9.65

Mar 11, 2026

0.9.64

Mar 11, 2026

0.9.63

Mar 11, 2026

0.9.62

Mar 11, 2026

0.9.61

Mar 11, 2026

0.9.60

Mar 11, 2026

0.9.59

Mar 10, 2026

0.9.58

Mar 10, 2026

0.9.57

Mar 10, 2026

0.9.55

Mar 10, 2026

0.9.54

Mar 10, 2026

0.9.53

Mar 10, 2026

0.9.52

Mar 10, 2026

0.9.51

Mar 10, 2026

0.9.50

Mar 9, 2026

0.9.49

Mar 9, 2026

0.9.48

Mar 9, 2026

0.9.47

Mar 9, 2026

0.9.46

Mar 9, 2026

0.9.45

Mar 9, 2026

0.9.44

Mar 9, 2026

0.9.43

Mar 9, 2026

0.9.42

Mar 9, 2026

0.9.41

Mar 9, 2026

0.9.40

Mar 9, 2026

0.9.39

Mar 7, 2026

This version

0.9.38

Mar 4, 2026

0.9.11

Feb 27, 2026

0.9.8

Feb 27, 2026

0.8.5

Feb 24, 2026

0.8.4

Feb 23, 2026

0.8.3

Feb 23, 2026

0.8.2

Feb 23, 2026

0.8.1

Feb 23, 2026

0.8.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantom_agent-0.9.38.tar.gz (406.2 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phantom_agent-0.9.38-py3-none-any.whl (497.8 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file phantom_agent-0.9.38.tar.gz.

File metadata

Download URL: phantom_agent-0.9.38.tar.gz
Upload date: Mar 4, 2026
Size: 406.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for phantom_agent-0.9.38.tar.gz
Algorithm	Hash digest
SHA256	`061d0018b5da85012540afe9f15d13f9f068eae22fd6b5f6cd4207875b2cdaef`
MD5	`e030f4181b28951afc6c315c366c2e4d`
BLAKE2b-256	`d5fde514a8cf460087d91bd9a8f2f5f2123ad1d969f0d890c6ddded2067d51da`

See more details on using hashes here.

File details

Details for the file phantom_agent-0.9.38-py3-none-any.whl.

File metadata

Download URL: phantom_agent-0.9.38-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 497.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for phantom_agent-0.9.38-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1332e5def0e26a8a795c6a36de37cd2f84e61890d5a7c056ff3371467fa3859`
MD5	`9ed62837b8223b6660b05607286426fe`
BLAKE2b-256	`eaaa0693f79ac252230f4175498f1a39bf6da34d3657266186d2949d9c699f48`

See more details on using hashes here.

phantom-agent 0.9.38

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PHANTOM

Overview

Core Capabilities

Architecture

Quick Start

Docker

Usage

Scan Profiles

Post-Scan Pipeline

Configuration

CI/CD Integration

Development

Testing

Security Audit

Documentation

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes