Autonomous Offensive Security Intelligence — AI-powered penetration testing
Project description
PHANTOM
Autonomous Adversary Simulation Platform
"Why so Serious!" — Phantom doesn't ask. It finds.
AI-driven penetration testing that reasons, adapts, and verifies — like a human red-teamer.
Overview
Phantom is an autonomous AI penetration testing agent. It uses large language models to discover and verify real vulnerabilities in web applications, APIs, and network services — with zero human intervention.
Unlike signature-based scanners, Phantom reasons about targets: it reads responses, identifies attack surfaces, chains multi-step exploits, and produces working proof-of-concept code for every confirmed finding.
| Traditional Scanners | Phantom | |
|---|---|---|
| Approach | Signature matching | LLM-guided reasoning |
| False Positives | 40–70% typical | Every finding verified with working PoC |
| Depth | Single-pass | Multi-phase with chained exploits |
| Reports | Generic dumps | MITRE ATT&CK mapped, compliance-ready |
| Triage | Manual review | Actionable findings + remediation |
Core Capabilities
Autonomous Operation — AI agents that plan, execute, and adapt penetration tests through a ReAct (Reasoning + Acting) loop. No hand-holding required.
30+ Security Tools — nmap, nuclei, sqlmap, ffuf, httpx, katana, semgrep, nikto, gobuster, arjun, Playwright browser, and more — all running inside an isolated Docker sandbox.
Sandboxed Execution — All offensive tools run in ephemeral Docker containers. No host filesystem access, restricted capabilities, automatic cleanup.
Multi-Agent System — Specialized sub-agents for parallel recon, exploitation, and validation. The root agent coordinates; sub-agents specialize.
7-Layer Security — Scope validator, tool firewall, Docker sandbox, cost controller, time limits, HMAC audit trail, output sanitizer.
Verified Findings — Every vulnerability includes a working PoC exploit, raw evidence, and reproducible steps. No guessing.
MITRE ATT&CK Enrichment — Automatic CWE, CAPEC, and technique-level tagging with CVSS 3.1 scoring.
Compliance Mapping — OWASP Top 10 (2021), PCI DSS v4.0, NIST 800-53 — mapped automatically per finding.
Knowledge Persistence — Cross-scan memory stores hosts, vulnerabilities, and false positive signatures. The agent learns from past scans.
Cost Tracking — Per-request and per-scan budget limits. Every token counted, every dollar tracked.
Architecture
High-Level Architecture
%%{init: {'theme': 'dark'}}%%
flowchart TB
subgraph Interface["Interface Layer"]
CLI["CLI / TUI"]
Stream["Streaming Parser"]
end
subgraph Orchestration["Orchestration"]
Profile["Scan Profiles"]
Scope["Scope Validator"]
Cost["Cost Controller"]
Audit["Audit Logger"]
end
subgraph Agent["Agent Core"]
ReAct["BaseAgent · ReAct Loop"]
State["State Machine"]
LLM["LLM Client · LiteLLM"]
Memory["Memory Compressor"]
Skills["Skills Engine"]
end
subgraph Security["Security Layer"]
Firewall["Tool Firewall"]
Verifier["Verification Engine"]
end
subgraph Execution["Docker Sandbox"]
ToolServer["Tool Server · HTTP API"]
Tools["30+ Security Tools"]
Browser["Playwright · Chromium"]
end
subgraph Output["Output"]
Reports["JSON / HTML / MD"]
Graph["Attack Graph"]
MITRE["MITRE ATT&CK"]
Compliance["Compliance"]
end
Interface --> Agent
Orchestration -.-> Agent
Agent --> Security
Security --> Execution
Agent --> Output
style Interface fill:#6c5ce7,stroke:#a29bfe,color:#fff
style Orchestration fill:#00b894,stroke:#55efc4,color:#fff
style Agent fill:#e17055,stroke:#fab1a0,color:#fff
style Security fill:#d63031,stroke:#ff7675,color:#fff
style Execution fill:#0984e3,stroke:#74b9ff,color:#fff
style Output fill:#fdcb6e,stroke:#ffeaa7,color:#2d3436
Scan Execution Flow
%%{init: {'theme': 'dark'}}%%
sequenceDiagram
participant User
participant CLI as Phantom CLI
participant Agent as Agent (ReAct)
participant Firewall as Tool Firewall
participant Sandbox as Docker Sandbox
participant LLM as LLM Provider
participant Target
User->>CLI: phantom scan --target app.com
CLI->>Sandbox: Create ephemeral container
CLI->>Agent: Initialize with scope + profile
rect rgba(108, 92, 231, 0.15)
Note over Agent,LLM: Reconnaissance
Agent->>LLM: Analyze target, plan strategy
LLM-->>Agent: Use nmap, httpx, nuclei
Agent->>Firewall: Validate tool call
Firewall-->>Agent: Approved
Agent->>Sandbox: Execute nmap -sV target
Sandbox->>Target: Probes
Target-->>Sandbox: Open ports & services
Sandbox-->>Agent: Results
end
rect rgba(214, 48, 49, 0.15)
Note over Agent,LLM: Exploitation
Agent->>LLM: Plan attacks from findings
LLM-->>Agent: SQLi on /api, XSS on /search
Agent->>Sandbox: Execute sqlmap
Sandbox->>Target: Injection payloads
Target-->>Sandbox: Vulnerability confirmed
Sandbox-->>Agent: Finding + evidence
end
rect rgba(0, 184, 148, 0.15)
Note over Agent,LLM: Verification
Agent->>Sandbox: Re-exploit with clean PoC
Sandbox->>Target: Reproduce attack
Target-->>Sandbox: Confirmed
end
Agent->>CLI: Reports (JSON/HTML/MD)
CLI->>User: Findings + PoCs + compliance
CLI->>Sandbox: Destroy container
Agent Decision Loop (ReAct)
%%{init: {'theme': 'dark'}}%%
stateDiagram-v2
[*] --> Observe: Scan initialized
Observe --> Reason: Tool results received
Reason --> Plan: LLM analyzes context
Plan --> Act: Select tool + arguments
Act --> Validate: Tool Firewall check
Validate --> Execute: Approved
Validate --> Reason: Blocked — re-plan
Execute --> Record: Results returned
Record --> CheckStop: Update state
CheckStop --> Observe: Continue
CheckStop --> Finalize: Stop condition met
Finalize --> Verify: Re-test critical findings
Verify --> Enrich: MITRE + compliance
Enrich --> Report: Generate reports
Report --> [*]: Scan complete
Sandbox Architecture
%%{init: {'theme': 'dark'}}%%
graph TB
subgraph Host["Host Machine"]
CLI["Phantom CLI"]
Docker["Docker Engine"]
end
subgraph Container["Ephemeral Sandbox · Kali-based · ~13GB"]
ToolServer["Tool Server API :48081"]
subgraph Toolkit["Offensive Tools"]
nmap & nuclei & sqlmap & ffuf
httpx & katana & semgrep & nikto
gobuster & arjun & more["20+ more"]
end
subgraph Runtime["Runtime"]
Python["Python 3.12"]
PW["Playwright + Chromium"]
Caido["Caido Proxy"]
end
end
CLI -->|"Authenticated HTTP"| ToolServer
ToolServer --> Toolkit
ToolServer --> Runtime
Container -.->|"Network"| Target["Target System"]
style Host fill:#2d3436,stroke:#636e72,color:#dfe6e9
style Container fill:#0984e3,stroke:#74b9ff,color:#fff
style Toolkit fill:#d63031,stroke:#ff7675,color:#fff
style Runtime fill:#6c5ce7,stroke:#a29bfe,color:#fff
7-Layer Security Model
Layer 1 Scope Validator Target allowlist, SSRF protection, DNS pinning
Layer 2 Tool Firewall Argument validation, shell injection blocking
Layer 3 Docker Sandbox Ephemeral container, restricted capabilities
Layer 4 Cost Controller Per-request ceiling ($5), scan budget ($25)
Layer 5 Time Limits Per-tool timeout, global scan timeout
Layer 6 HMAC Audit Trail Tamper-evident, append-only event log
Layer 7 Output Sanitizer PII stripping, credential redaction
Quick Start
Requirements: Docker (running) · Python 3.12+ · An LLM API key
# Install
pip install phantom-agent
# or: pipx install phantom-agent
# Configure
export PHANTOM_LLM="openai/gpt-4o"
export LLM_API_KEY="your-api-key"
# Run your first scan
phantom scan --target https://your-app.com
First run pulls the sandbox image (~13GB). Results saved to phantom_runs/.
Docker
docker run --rm -it \
-e PHANTOM_LLM="openai/gpt-4o" \
-e LLM_API_KEY="your-key" \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/usta0x001/phantom:latest \
scan --target https://your-app.com
Usage
# Quick scan (~15 min)
phantom scan --target https://app.com --scan-mode quick
# Standard scan (~45 min)
phantom scan --target https://app.com
# Deep scan (1–3 hours, exhaustive)
phantom scan --target https://app.com --scan-mode deep
# Stealth (low-noise, IDS/WAF evasion)
phantom scan --target https://app.com --scan-mode stealth
# API-only (no browser)
phantom scan --target https://api.app.com --scan-mode api_only
# With custom instructions
phantom scan --target https://app.com \
--instruction "Focus on SQL injection in /api/v2 endpoints"
# Resume interrupted scan
phantom scan --target https://app.com --resume
# Interactive TUI
phantom --target https://app.com
# Non-interactive (CI/CD)
phantom scan --target https://app.com --non-interactive
Scan Profiles
| Profile | Iterations | Duration | Best For |
|---|---|---|---|
quick |
60 | ~15 min | CI/CD, rapid checks |
standard |
120 | ~45 min | Regular testing |
deep |
300 | 1–3 hours | Comprehensive audit |
stealth |
60 | ~30 min | Production, IDS evasion |
api_only |
100 | ~45 min | REST/GraphQL APIs |
Post-Scan Pipeline
Every scan runs a 7-stage enrichment pipeline automatically:
1. MITRE ATT&CK CWE, CAPEC, OWASP mapping
2. Compliance OWASP Top 10, PCI DSS v4, NIST 800-53
3. Attack Graph NetworkX path analysis
4. Nuclei Templates Auto-generated YAML for regression
5. Knowledge Store Persistent memory updated
6. Notifications Webhook/Slack for critical findings
7. Reports JSON, HTML, Markdown output
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
PHANTOM_LLM |
LLM model identifier | openai/gpt-4o |
LLM_API_KEY |
API key (comma-separated for rotation) | — |
PHANTOM_REASONING_EFFORT |
low / medium / high |
high |
PHANTOM_SCAN_MODE |
Default scan profile | standard |
PHANTOM_IMAGE |
Sandbox Docker image | ghcr.io/usta0x001/phantom-sandbox:latest |
PHANTOM_MAX_COST |
Max cost per scan (USD) | 25.0 |
PHANTOM_PER_REQUEST_CEILING |
Max cost per LLM request | 5.0 |
PHANTOM_WEBHOOK_URL |
Webhook for critical findings | — |
PHANTOM_DISABLE_BROWSER |
Disable Playwright | false |
Supported LLM Providers
Phantom uses LiteLLM — any of 100+ providers work:
| Provider | Model Example | Notes |
|---|---|---|
| OpenAI | openai/gpt-4o |
Best overall |
| Anthropic | anthropic/claude-sonnet-4-20250514 |
Strong reasoning |
gemini/gemini-2.5-pro |
Large context | |
| Groq | groq/llama-3.3-70b-versatile |
Free tier |
| DeepSeek | deepseek/deepseek-chat |
Cost-effective |
| OpenRouter | openrouter/deepseek/deepseek-v3.2 |
Multi-provider |
| Ollama | ollama/llama3.1 |
Local, no API key |
| Azure | azure/gpt-4o |
Enterprise |
CI/CD Integration
GitHub Actions
name: Security Scan
on:
pull_request:
branches: [main]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install phantom-agent
- run: phantom scan --target ./ --non-interactive --scan-mode quick
env:
PHANTOM_LLM: ${{ secrets.PHANTOM_LLM }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
Development
git clone https://github.com/Usta0x001/Phantom.git
cd Phantom
python -m venv .venv && source .venv/bin/activate # .venv\Scripts\activate on Windows
pip install -e ".[dev]"
pytest tests/ -v
Project Structure
phantom/
├── phantom/ # Core package
│ ├── agents/ # Agent system (ReAct loop, state, delegation)
│ ├── core/ # Security, reporting, knowledge (20+ modules)
│ ├── tools/ # 30+ security tool wrappers
│ ├── llm/ # LLM client, memory compression
│ ├── runtime/ # Docker sandbox management
│ ├── interface/ # CLI, TUI, streaming
│ ├── models/ # Pydantic domain models
│ ├── skills/ # Domain knowledge files
│ └── telemetry/ # Run tracing
├── tests/ # 731+ tests
├── containers/ # Sandbox Dockerfile
├── scripts/ # Build scripts
└── docs/ # Documentation
Testing
731+ tests, 0 failures:
| Suite | Tests | Scope |
|---|---|---|
| Integration | 153 | Full system E2E |
| Audit fixes | 39 | Security fix verification |
| Unit tests | ~200 | Module-level |
| Feature tests | ~100 | Regression |
| Coverage tests | ~80 | Gap coverage |
| Security tests | ~50 | Security-specific |
Security Audit
Two deep offensive audits on the codebase — all findings resolved:
| Severity | Found | Fixed |
|---|---|---|
| Critical | 8 | 8 |
| High | 19 | 19 |
| Medium | 34 | 34 |
| Low | 22 | 22 |
| Total | 83 | 83 |
Documentation
- Architecture — System design
- Documentation — Full reference
- Quick Start — Get scanning in 2 minutes
- Contributing — Development guidelines
Contributing
Contributions welcome. See CONTRIBUTING.md.
- Bugs — Open an issue
- Features — Start a discussion
- PRs — Fork, branch, test, submit
License
Apache License 2.0 — see LICENSE.
Acknowledgements
Built on: LiteLLM · Nuclei · Playwright · Textual · Rich · NetworkX · SQLMap
PHANTOM — "Why so Serious!"
Made by Usta0x001
WARNING: Only test systems you own or have explicit written authorization to test. Unauthorized access to computer systems is illegal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phantom_agent-0.9.38.tar.gz.
File metadata
- Download URL: phantom_agent-0.9.38.tar.gz
- Upload date:
- Size: 406.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
061d0018b5da85012540afe9f15d13f9f068eae22fd6b5f6cd4207875b2cdaef
|
|
| MD5 |
e030f4181b28951afc6c315c366c2e4d
|
|
| BLAKE2b-256 |
d5fde514a8cf460087d91bd9a8f2f5f2123ad1d969f0d890c6ddded2067d51da
|
File details
Details for the file phantom_agent-0.9.38-py3-none-any.whl.
File metadata
- Download URL: phantom_agent-0.9.38-py3-none-any.whl
- Upload date:
- Size: 497.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1332e5def0e26a8a795c6a36de37cd2f84e61890d5a7c056ff3371467fa3859
|
|
| MD5 |
9ed62837b8223b6660b05607286426fe
|
|
| BLAKE2b-256 |
eaaa0693f79ac252230f4175498f1a39bf6da34d3657266186d2949d9c699f48
|