Autonomous AI pentesting with 200+ tools, exploit chaining, PoC validation, and credential-safe MCP server
Project description
pentest-ai
Autonomous pentests from one command. Real tools. Real PoCs. Real reports.
⚠️ Offensive tooling, authorized testing only. By installing you accept the AUP and Terms. Full text in Responsible use ↓
Point it at a target. It runs recon, logs into the app, chains vulnerabilities into attack paths, proves every finding with a working PoC, and hands back a report your blue team can act on.
No cloud. No telemetry. Your laptop, your keys, your data.
See it run
That was one prompt to Claude Code. The MCP server picked up the request, ran ptai's tools against the target, and Claude streamed the findings back into the session. Subscription-driven; no API key.
The scan returned 17 critical, 53 high, 107 total findings, 7 confirmed attack chains, and 264 generated detection rules against a stock OWASP Juice Shop instance. JWT alg:none accepted on 8+ protected endpoints, SQLi auth bypass on /rest/user/login, UNION-based SQLi on /rest/products/search, path-filter bypass via NUL byte, XXE disclosing /etc/passwd, file upload polyglot, mass assignment, password reset bypass. All real bugs, all proven with PoCs.
Recording is the actual output of
claude -pagainst a local OWASP Juice Shop withpentest-airegistered as an MCP server. Cast file inassets/realdemo.cast; the time-paced re-render used for the GIF is inassets/realdemo-paced.cast. Findings are real; inter-line timing was reconstructed for watchability sinceclaude -pbuffers and dumps in non-interactive mode. A deterministic synthesized fallback (assets/demo.tape+assets/demo.sh) is kept for reproducible re-renders.
Honesty caveat: Juice Shop is the most-written-about deliberately-vulnerable app on the internet, so LLMs and probe authors both have a head start. Against a novel target the catch rate is whatever the curated probe library actually covers — ~40 web probes today, growing each release. The LLM coordinates and reasons about results; it doesn't replace the probes. A private honeypot harness in
tests/honeypot/measures coverage against bugs we wrote ourselves. Numbers there are lower than Juice Shop and that's the point — we publish both.
Install
pip install ptai
Already paying for Claude Pro or Max? Wire ptai into Claude Code as an MCP server and your subscription runs the engagement. Zero API spend:
claude mcp add pentest-ai -- ptai mcp
Restart Claude Code, then ask: "Run an authenticated pentest against staging.acme.com. Login is at /login, password is in $APP_PASS."
Other install paths (Cursor / Copilot / VS Code wizard, API keys, no-LLM menu, REST API, MCP composition, HITL teleoperation, benchmarks, cloud workspace)
Interactive wizard (Claude Desktop, Cursor, VS Code Copilot)
ptai setup --mcp
Auto-detects the clients you have installed, writes their config files, and tells you to restart them.
Or use an API key
For CI pipelines, scheduled runs, or standalone use without an MCP client:
export ANTHROPIC_API_KEY=sk-ant-... # Claude (best results)
# or
export OPENAI_API_KEY=sk-... # OpenAI
# or, fully local, no cloud
export OLLAMA_HOST=localhost:11434 # Ollama
# or, any of 300+ models via LiteLLM (OpenRouter, Azure, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere, ...)
pip install ptai[litellm]
ptai start <target> --provider litellm --model openrouter/anthropic/claude-sonnet-4
ptai start https://your-target.com
First run installs the tool deps it needs (nmap, nuclei, ffuf, sqlmap, gobuster, and more). No setup afterwards.
No LLM at all (interactive launcher)
ptai menu
Numeric category navigation, search (/term), tag filtering (t web), keyword-based recommendation. Real engagements still go through ptai start with full scope confirmation.
HTTP REST API (for dashboards and integrations)
pip install ptai[api]
ptai serve --port 8888
Endpoints: /health, /version, /agents, /tools, /engagements (list, detail, findings, chains, detection rules, SARIF export). Write endpoints (POST /engagements, POST /engagements/{id}/abort) require Authorization: Bearer $PENTEST_AI_API_TOKEN. Live event stream at WS /engagements/{id}/stream.
Load other MCP servers as tool sources
Compose with hexstrike or any other MCP-compatible security server. Edit ~/.pentest-ai/mcp_servers.json:
{
"servers": [
{"name": "hexstrike", "command": "python3 hexstrike_mcp.py", "transport": "stdio"}
]
}
Take over mid-run (HITL teleoperation)
While an engagement is running, press Ctrl+C twice within 600ms to pause the orchestrator and drop into a REPL: step, inspect findings, inject <instruction>, skip, resume, abort. Acknowledges that current LLMs aren't fully autonomous. The operator owns the call when it matters.
Public benchmarks
Reproducible solve-rate measurements live in benchmarks/:
./benchmarks/scripts/run_all.sh # writes JSON per run + RESULTS.md
Spec, harness, results all in git. No "98.7% detection rate" claims you can't audit.
Cloud workspace (Pro / Team / Enterprise)
The CLI is free forever and stores everything locally. If you want engagement history, branded client-ready PDF reports, and team collaboration, link the CLI to an app.pentestai.xyz workspace:
# Sign up, then Dashboard → API Keys → Generate → copy ptai_...
ptai auth login # paste the key (hidden prompt)
ptai auth status # confirm link
# or use an env var for CI:
export PENTESTAI_API_KEY=ptai_...
ptai start runs auto-sync findings to your cloud workspace. No cloud = no calls; integration is silently off unless you log in.
Why it's different
| 🤖 LLM-coordinated, not LLM-dependent | Seventeen specialist agents cover recon, web, API, AD, cloud, mobile, wireless, browser, credentials, privesc, vuln scan, chaining, PoC, detection, report, social-engineer, and LLM red team. The LLM orchestrates the phase loop and reasons about results; the bug-class detection lives in a curated deterministic probe library. Set no API key and the same probes still run — the LLM is the coordinator, not the scanner. |
| 🔐 It logs in | Most scanners die at the login page. This one holds a session, rotates creds, and every downstream tool inherits the cookie. Auth profiles store references (env vars, op://, Vault paths, AWS Secrets Manager ARNs), never the value. |
| 🧪 Every finding is proven | A non-destructive proof of concept runs against the target. No more triaging 40 maybes from a noisy scanner. |
| 🔌 Drop-in for Claude, Cursor, Copilot | An MCP server with 44 tools. Talk to your assistant: "diff last week's engagement against today's." |
| ⚡ CI-native | A GitHub Action, severity gates, SARIF output, and PR comments. Works the day you drop it in. |
| 💾 Runs on your laptop | MIT licensed. No cloud calls. Works offline with Ollama. Your findings stay on your disk. |
How it works
┌─────────────────────────────────────────────────────────────┐
│ ptai start <target> │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌─────────┐
│ recon │ → │ auth │ → │ web │
└────────┘ └────────┘ └─────────┘
│
┌────────────────────────────────────┤
▼ ▼
┌────────┐ ┌─────────┐
│ ad │ ┌──────────────────┐ │ cloud │
└────────┘ │ Findings DB │ └─────────┘
│ │ (sqlite + evidence)│ │
└───────▶│ scope-guarded │◀──────┘
│ deduplicated │
└──────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────┐ ┌─────────┐ ┌──────────┐
│chain │ │validate │ │ detect │
└──────┘ └─────────┘ └──────────┘
│
▼
┌──────────┐
│ report │ md · html · pdf · SARIF · JUnit
└──────────┘
Each agent runs with an LLM when you've set a key, or as a deterministic tool loop when you haven't. Either way the phase order is the same.
Agents
| Agent | Phase | Does |
|---|---|---|
recon |
1 | Port scan, DNS and subdomain enum, service fingerprinting |
web |
2 | Authenticated OWASP Testing Guide v4 pass |
api_security |
2 | OpenAPI/GraphQL/REST surface analysis, OWASP API Top 10 |
browser |
2 | Playwright-driven DOM analysis, XHR capture, security-header grading |
ad |
3 | AD enum, Kerberoasting, BloodHound pathfinding, delegation abuse |
cloud |
4 | AWS, Azure, GCP IAM, misconfig, K8s RBAC, serverless |
credential_tester |
4 | Password spraying, credential stuffing, MFA bypass checks |
privesc |
5 | Local and lateral privilege-escalation advice from collected context |
vuln_scanner |
5 | Cross-cutting vuln aggregation against the findings DB |
exploit_chain |
6 | Correlates findings into multi-step attack paths |
poc_validator |
7 | Non-destructive proof of concept per finding |
detection |
8 | Sigma, SPL, KQL rules for the blue team |
report |
9 | Markdown, HTML, PDF, SARIF, JUnit, compliance maps |
llm_redteam |
opt | OWASP LLM Top 10 probes |
social_engineer |
opt | Phishing corpus and pretext generation |
mobile |
opt | Android/iOS static + dynamic checks |
wireless |
opt | Wireless reconnaissance and handshake capture |
Playbooks
Your methodology as a file. Checked into git. Shared with your team.
name: internal-ad-pentest
inputs:
domain: { required: true, prompt: "AD domain" }
dc_ip: { required: true, prompt: "DC IP" }
phases:
- id: recon
tools: [nmap, masscan]
- id: ad-enum
depends_on: [recon]
condition: "any_finding(type='open_port', port=445)"
tools: [enum4linux, ldapsearch, bloodhound-python]
- id: kerberoast
requires_finding: { type: ad_user_enumerated }
tools: [impacket-getuserspns]
llm_decide: true # let the LLM skip if context says useless
ptai playbook list # show installed playbooks
ptai playbook show web-app-quick # preview before running
ptai playbook run ./my-ad.yaml # execute
Five playbooks ship built-in. A community catalog is coming.
Drop it into your CI
# .github/workflows/security.yml
name: Security scan
on: [pull_request]
jobs:
ptai:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install ptai
- run: |
ptai start ${{ vars.STAGING_URL }} \
--ci \
--fail-on high \
--sarif pentest.sarif
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: pentest.sarif
Findings post as a PR comment, SARIF uploads to GitHub Code Scanning, and the build fails on gated severity. GitLab CI and Jenkins templates plus advanced options (auth profiles in CI, cost gates, scope files) → docs/ci-cd.md.
vs the field
ptai |
Sn1per | Nuclei | Burp Pro | PentestGPT | |
|---|---|---|---|---|---|
| Autonomous phase loop | ✓ | ✓ | ✓ | ||
| Authenticated scanning | ✓ | partial | raw HTTP | ✓ | |
| Exploit chaining | ✓ | partial | |||
| PoC validation | ✓ | partial | |||
| Diff and retest | ✓ | ||||
| CI-native (SARIF + gates) | ✓ | partial | partial | ||
| LLM red team | ✓ | ||||
| YAML playbooks | ✓ | templates | |||
| MCP server | ✓ | ||||
| License | MIT | GPL | MIT | commercial | MIT |
What's inside
- 17 agents across recon, web, API security, AD, cloud, mobile, wireless, browser, credential testing, privilege escalation, vulnerability scanning, exploit chaining, PoC validation, detection, reporting, LLM red team, and social engineering
- 200+ tool wrappers with auto-install: nmap, masscan, nuclei, ffuf, sqlmap, gobuster, wapiti, nikto, dalfox, xsstrike, enum4linux, bloodhound-python, impacket's full suite, trufflehog, gitleaks, kube-hunter, trivy, and more
- 4000+ Nuclei templates integrated for atomic vulnerability detection across web, network, cloud, and CVE-specific checks
- 44 MCP tools for LLM-driven engagements
- 300+ LLM models via the LiteLLM provider (Anthropic, OpenAI, Ollama direct; Azure, OpenRouter, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere via LiteLLM)
- HTTP REST API + WebSocket surface (
ptai serve) for non-MCP integrations - Local web dashboard with live engagement view, findings table, attack chain visualization, SARIF export
- Browser automation agent with screenshot capture, DOM analysis, network capture, security header grading (Playwright-driven)
- Human-In-The-Loop teleoperation (Ctrl+C twice to take over an engagement mid-run)
- MCP client capability to load external MCP servers as tool sources
- Public reproducible benchmark harness in
benchmarks/. Your numbers, your code, in git. - 6 output formats: Markdown, HTML, PDF, SARIF 2.1.0, JUnit XML, compliance mappings (OWASP, CWE, CVE, CVSS v3.1)
- 1,000+ tests with CI on Python 3.10, 3.11, 3.12, and 3.13
- MIT licensed, 100% yours
Who uses it for what
AppSec teams. Wire ptai into your CI. Every PR against staging gets an authenticated scan. The build fails on high-severity findings. The fix → retest → confirm loop runs on its own.
Consultants. Scope a week-long engagement, point ptai at the estate, and spend your time on the creative work instead of glueing scanners together and writing the report. The report is already written.
Bug bounty hunters. Run it over breakfast. Come back to a list of validated findings with PoCs ready to paste into HackerOne.
Red teamers. Drop your internal AD methodology into a YAML playbook. Run it against every new engagement. Share it with your team.
Developers shipping AI features. Enable --enable-llm-redteam against your chatbot. Get an OWASP LLM Top 10 report in minutes.
Responsible use
pentest-ai is offensive security tooling. It executes real network and host operations against the targets you specify. You are solely responsible for ensuring you have explicit, written authorization to test every target.
By installing or running ptai you agree to the Acceptable Use Policy and the Terms of Service. Testing systems you do not own without written authorization may violate the Computer Fraud and Abuse Act, the Computer Misuse Act 1990, GDPR Article 32, and equivalents in your jurisdiction. Misuse is your sole responsibility.
First-run prompts you to confirm AUP acceptance and persists the choice to ~/.pentest-ai/aup-consent.txt. Set PENTEST_AI_AUP_ACCEPTED=1 in CI to bypass the prompt non-interactively.
On startup ptai loads a scope file. Out-of-scope hosts are refused at tool-invocation time. PoCs are non-destructive by default. Rate limits kick in automatically in stealth mode. Don't be that person.
Ecosystem
| Repo | What |
|---|---|
| pentest-ai | This repo. The CLI and MCP server. Python product. |
| pentest-ai-agents | Standalone Claude Code subagent markdown files. Optional, runs without this CLI. |
Need shared workspaces, branded PDF reports, SSO, or a managed engagement? The website has Pro / Team / Enterprise dashboards and a one-shot Launch Engagement option. The OSS tool stays OSS, free forever.
Community
- Discord: join the server. Chat, get help, share findings, lurk.
- Questions, ideas, feedback: GitHub Discussions
- Bug reports: GitHub Issues
- Show and tell: post the wildest finding
ptaigave you in Show and tell
Contributing
PRs welcome. Before you submit:
ruff check . && mypy . && pytest -q
See CONTRIBUTING.md for the full flow.
Contributors
PRs welcome. See Contributing above.
Star history
License
MIT. Do whatever you want with it.
If ptai saved you a Sunday, star the repo. It's the only payment I ask for.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ptai-0.12.0.tar.gz.
File metadata
- Download URL: ptai-0.12.0.tar.gz
- Upload date:
- Size: 598.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df32c2330113982307453d338a956b09a9faab6df8bf60a67466d63512214374
|
|
| MD5 |
bf88736431497cbcfac7b3fdeb66d2a4
|
|
| BLAKE2b-256 |
94c1755bd0d78b1876fc7e55f9ecc464e1384405d94f386d21ff22cfc9b56d78
|
File details
Details for the file ptai-0.12.0-py3-none-any.whl.
File metadata
- Download URL: ptai-0.12.0-py3-none-any.whl
- Upload date:
- Size: 462.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22310800ab2874843fc966a42683f861786fd881db0a73e8a20de40abbeee5ac
|
|
| MD5 |
2a01085264cc3bfce114cc5273516c39
|
|
| BLAKE2b-256 |
1c0dc9476c1ebd672a136269e2cb684a5b2110b2cc8a08db867943ce15197573
|