Autonomous AI pentesting with 200+ tools, exploit chaining, PoC validation, and credential-safe MCP server

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0xSteve

These details have not been verified by PyPI

Project links

Project description

pentest-ai

Autonomous pentests from one command. Real tools, working PoCs, audit-ready reports.

Website · Install · Docs · Agents · Discord

⚠️ Offensive tooling, authorized testing only. By installing you accept the AUP and Terms. Full text in Responsible use ↓

Point it at a target. It runs recon, logs in, and ties findings into multi-step attack paths. Every finding comes with a working PoC. The report writes itself.

Runs on your laptop. No cloud, no telemetry.

What's new in 0.13.0 (2026-05-11). MCP-native iterative driving. Claude Code (or any MCP client) can drive probes one at a time via list_probes, run_probe, and http_request. The last one lets the LLM send arbitrary HTTP requests under a hard scope guard, so it can chain attacks no canned probe covers (stored SSTI, PATCH mass assignment, race orchestration). Also a critical fix to the MCP auth wiring that was silently skipping every requires_auth=True probe on the fire-and-forget path. End-to-end honeypot catch rate via MCP: 6/10 → 10/10. See CHANGELOG.

See it run

ptai-via-Claude-Code-MCP scanning OWASP Juice Shop: 17 critical findings, 7 attack chains, 264 detection rules generated

One prompt to Claude Code. The MCP server ran ptai's tools against the target, and Claude streamed findings back into the session. Subscription-driven, no API key.

The scan returned 17 critical, 53 high, 107 total findings, 7 confirmed attack chains, and 264 generated detection rules against a stock OWASP Juice Shop instance. JWT alg:none accepted on 8+ protected endpoints, SQLi auth bypass on /rest/user/login, UNION-based SQLi on /rest/products/search, path-filter bypass via NUL byte, XXE disclosing /etc/passwd, file upload polyglot, mass assignment, password reset bypass. Each one has a working PoC.

Recording is the actual output of claude -p against a local OWASP Juice Shop with pentest-ai registered as an MCP server. Cast file in assets/realdemo.cast; the time-paced re-render used for the GIF is in assets/realdemo-paced.cast. Findings are real; inter-line timing was reconstructed for watchability since claude -p buffers and dumps in non-interactive mode. A deterministic synthesized fallback (assets/demo.tape + assets/demo.sh) is kept for reproducible re-renders.

Honesty caveat: Juice Shop is the most-written-about deliberately-vulnerable app on the internet, so LLMs and probe authors both have a head start. Against a novel target the catch rate is whatever the curated probe library actually covers: 60 web probes today, growing each release. The LLM coordinates and reasons about results; it doesn't replace the probes. A private honeypot harness in tests/honeypot/ measures coverage against bugs we wrote ourselves and is asserted in CI at 10/10 caught (tests/honeypot/test_mcp_honeypot_e2e.py). Numbers there are lower than Juice Shop, and that's the point. We publish both.

Install

pip install ptai

Already paying for Claude Pro or Max? Wire ptai into Claude Code as an MCP server and your subscription runs the engagement. Zero API spend:

claude mcp add pentest-ai -- ptai mcp

Restart Claude Code, then ask: "Run an authenticated pentest against staging.acme.com. Login is at /login, password is in $APP_PASS."

Other install paths (Cursor / Copilot / VS Code wizard, API keys, no-LLM menu, REST API, MCP composition, HITL teleoperation, benchmarks, cloud workspace)

Interactive wizard (Claude Desktop, Cursor, VS Code Copilot)

ptai setup --mcp

Auto-detects the clients you have installed, writes their config files, and tells you to restart them.

Or use an API key

For CI pipelines, scheduled runs, or standalone use without an MCP client:

export ANTHROPIC_API_KEY=sk-ant-...   # Claude (best results)
# or
export OPENAI_API_KEY=sk-...          # OpenAI
# or, fully local, no cloud
export OLLAMA_HOST=localhost:11434    # Ollama
# or, any of 300+ models via LiteLLM (OpenRouter, Azure, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere, ...)
pip install ptai[litellm]
ptai start <target> --provider litellm --model openrouter/anthropic/claude-sonnet-4

ptai start https://your-target.com

First run installs the tool deps it needs (nmap, nuclei, ffuf, sqlmap, gobuster, and more). No setup afterwards.

Installing tools

ptai wraps 200+ external security tools. Three ways to install them:

# 1. Zero-config (recommended). The agent predicts what tools the engagement
#    needs at start and asks ONCE to install the missing ones. Decline once
#    and your answer persists in ~/.pentest-ai/install-preferences.json.
ptai start https://target.example.com   # prompts on first run if anything is missing

# 2. Batch install upfront — skips the agent prompt entirely.
ptai setup --tier core            # ~6 essentials, ~30s
ptai setup --tier recommended     # + fuzzers, crawlers, password tools, ~5m
ptai setup --tier full            # everything, ~30m

# 3. Install specific tools by name.
ptai setup --per-tool wpscan,dalfox,paramspider
ptai setup --wizard               # interactive picker

The agent's tool-install prompt is non-interactive-aware: in CI or with PTAI_NON_INTERACTIVE=1, ptai uses whatever tools are already on PATH and logs (rather than prompts) for anything missing.

No LLM at all (interactive launcher)

ptai menu

Numeric category navigation, search (/term), tag filtering (t web), keyword-based recommendation. Real engagements still go through ptai start with full scope confirmation.

HTTP REST API (for dashboards and integrations)

pip install ptai[api]
ptai serve --port 8888

Endpoints: /health, /version, /agents, /tools, /engagements (list, detail, findings, chains, detection rules, SARIF export). Write endpoints (POST /engagements, POST /engagements/{id}/abort) require Authorization: Bearer $PENTEST_AI_API_TOKEN. Live event stream at WS /engagements/{id}/stream.

Load other MCP servers as tool sources

Compose with hexstrike or any other MCP-compatible security server. Edit ~/.pentest-ai/mcp_servers.json:

{
  "servers": [
    {"name": "hexstrike", "command": "python3 hexstrike_mcp.py", "transport": "stdio"}
  ]
}

Take over mid-run (HITL teleoperation)

While an engagement is running, press Ctrl+C twice within 600ms to pause the orchestrator and drop into a REPL: step, inspect findings, inject <instruction>, skip, resume, abort. Acknowledges that current LLMs aren't fully autonomous. The operator owns the call when it matters.

Public benchmarks

Reproducible solve-rate measurements live in benchmarks/:

./benchmarks/scripts/run_all.sh   # writes JSON per run + RESULTS.md

Spec, harness, results all in git. No "98.7% detection rate" claims you can't audit.

Cloud workspace (Pro / Team / Enterprise)

The CLI is free forever and stores everything locally. If you want engagement history, branded client-ready PDF reports, and team collaboration, link the CLI to an app.pentestai.xyz workspace:

# Sign up, then Dashboard → API Keys → Generate → copy ptai_...
ptai auth login        # paste the key (hidden prompt)
ptai auth status       # confirm link
# or use an env var for CI:
export PENTESTAI_API_KEY=ptai_...

ptai start runs auto-sync findings to your cloud workspace. No cloud = no calls; integration is silently off unless you log in.

Why it's different


🤖 LLM-coordinated, not LLM-dependent	Seventeen agents cover recon, web, API, AD, cloud, mobile, wireless, browser, credentials, privesc, vuln scan, chaining, PoC, detection, report, social engineering, and LLM red team. The LLM runs the phase loop and reasons about results; bug detection is in the curated deterministic probe library. Set no API key and the same probes still run. The LLM coordinates; it doesn't scan.
🔐 It logs in	Most scanners die at the login page. This one holds a session, refreshes credentials when they expire, and every downstream tool inherits the cookie. Auth profiles store references (env vars, `op://`, Vault paths, AWS Secrets Manager ARNs), never the value.
🧪 Every finding is proven	A non-destructive proof of concept runs against the target. No more triaging 40 maybes from a noisy scanner.
🔌 Drop-in for Claude, Cursor, Copilot	An MCP server with 47 tools. Ask your assistant: "diff last week's engagement against today's."
⚡ CI-native	GitHub Action, severity gates, SARIF output, PR comments. Drop it into your workflow file and it runs on the next PR.
💾 Runs on your laptop	MIT licensed, no cloud calls. Runs offline with Ollama. Findings stay on your disk.

How it works

┌─────────────────────────────────────────────────────────────┐
│                    ptai start <target>                      │
└─────────────────────────────────────────────────────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          ▼                  ▼                  ▼
      ┌────────┐        ┌────────┐        ┌─────────┐
      │ recon  │   →    │  auth  │   →    │   web   │
      └────────┘        └────────┘        └─────────┘
                                               │
          ┌────────────────────────────────────┤
          ▼                                    ▼
      ┌────────┐                          ┌─────────┐
      │   ad   │   ┌──────────────────┐   │ cloud   │
      └────────┘   │  Findings DB     │   └─────────┘
          │        │  (sqlite + evidence)│       │
          └───────▶│  scope-guarded     │◀──────┘
                   │  deduplicated      │
                   └──────────────────┘
                             │
                ┌────────────┼────────────┐
                ▼            ▼            ▼
           ┌──────┐    ┌─────────┐  ┌──────────┐
           │chain │    │validate │  │ detect   │
           └──────┘    └─────────┘  └──────────┘
                             │
                             ▼
                       ┌──────────┐
                       │  report  │   md · html · pdf · SARIF · JUnit
                       └──────────┘

Each agent runs with an LLM when you've set a key, or as a deterministic tool loop when you haven't. Either way the phase order is the same.

Agents

Agent	Phase	Does
`recon`	1	Port scan, DNS and subdomain enum, service fingerprinting
`web`	2	Authenticated OWASP Testing Guide v4 pass
`api_security`	2	OpenAPI/GraphQL/REST surface analysis, OWASP API Top 10
`browser`	2	Playwright-driven DOM analysis, XHR capture, security-header grading
`ad`	3	AD enum, Kerberoasting, BloodHound pathfinding, delegation abuse
`cloud`	4	AWS, Azure, GCP IAM, misconfig, K8s RBAC, serverless
`credential_tester`	4	Password spraying, credential stuffing, MFA bypass checks
`privesc`	5	Local and lateral privilege-escalation advice from collected context
`vuln_scanner`	5	Cross-cutting vuln aggregation against the findings DB
`exploit_chain`	6	Correlates findings into multi-step attack paths
`poc_validator`	7	Non-destructive proof of concept per finding
`detection`	8	Sigma, SPL, KQL rules for the blue team
`report`	9	Markdown, HTML, PDF, SARIF, JUnit, compliance maps
`llm_redteam`	opt	OWASP LLM Top 10 probes
`social_engineer`	opt	Phishing corpus and pretext generation
`mobile`	opt	Android/iOS static + dynamic checks
`wireless`	opt	Wireless reconnaissance and handshake capture

Playbooks

Your methodology as a file. Checked into git. Shared with your team.

name: internal-ad-pentest
inputs:
  domain: { required: true, prompt: "AD domain" }
  dc_ip:  { required: true, prompt: "DC IP" }

phases:
  - id: recon
    tools: [nmap, masscan]

  - id: ad-enum
    depends_on: [recon]
    condition: "any_finding(type='open_port', port=445)"
    tools: [enum4linux, ldapsearch, bloodhound-python]

  - id: kerberoast
    requires_finding: { type: ad_user_enumerated }
    tools: [impacket-getuserspns]
    llm_decide: true         # let the LLM skip if context says useless

ptai playbook list                  # show installed playbooks
ptai playbook show web-app-quick    # preview before running
ptai playbook run ./my-ad.yaml      # execute

Five playbooks ship built-in. A community catalog is coming.

Drop it into your CI

# .github/workflows/security.yml
name: Security scan
on: [pull_request]

jobs:
  ptai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ptai
      - run: |
          ptai start ${{ vars.STAGING_URL }} \
            --ci \
            --fail-on high \
            --sarif pentest.sarif
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: pentest.sarif

Findings post as a PR comment, SARIF uploads to GitHub Code Scanning, and the build fails on gated severity. GitLab CI and Jenkins templates plus advanced options (auth profiles in CI, cost gates, scope files) → docs/ci-cd.md.

vs the field

	`ptai`	Hexstrike	ZAP	Nuclei	Burp Pro	PentestGPT
LLM drives via MCP (no API key needed)	✓	✓
LLM-synthesized HTTP under scope guard	✓	partial
Authenticated scanning via MCP	✓	partial	partial	raw HTTP	✓
Exploit chaining	✓	partial				partial
Non-destructive PoC validation	✓				partial
Stored injection chains (POST → GET verify)	✓	manual	partial		manual
Curated probes (specialised, not template-driven)	60	tool-wrapper-driven	rule-driven	8000+ templates	manual + scan	—
Wrapped CLI security tools	200+	150+	—	—	—	—
CI-native (SARIF + severity gates)	✓		partial	partial	partial
LLM red team probes	✓
YAML playbooks	✓			templates
License	MIT	MIT	Apache-2.0	MIT	commercial	MIT

Recent benchmarks: fully autonomous LLM-pentest agents finish 21–31% of tasks end-to-end, while human-assisted setups reach 64% (ARTEMIS, DARPA AICC Atlantis, xOffense). ptai is built for the human-assisted regime: the LLM reasons about results, the curated probe library does the detection, and Ctrl+C twice lets the operator take over mid-engagement.

What's inside

17 agents across recon, web, API security, AD, cloud, mobile, wireless, browser, credential testing, privilege escalation, vulnerability scanning, exploit chaining, PoC validation, detection, reporting, LLM red team, and social engineering
60 curated web probes covering OWASP Top 10 + API Top 10: SQLi (reflected/blind/login-bypass), reflected/stored/DOM XSS, SSRF, XXE, path traversal, IDOR (anonymous + authenticated + auth-fallback), JWT alg-confusion + variants, race conditions, type confusion, mass assignment (POST and PATCH), trusted-header bypass, host-header poisoning, insecure session cookies, security misconfigurations
200+ tool wrappers with auto-install: nmap, masscan, nuclei, ffuf, sqlmap, gobuster, wapiti, nikto, dalfox, xsstrike, enum4linux, bloodhound-python, impacket's full suite, trufflehog, gitleaks, kube-hunter, trivy, and more
4000+ Nuclei templates integrated for atomic vulnerability detection across web, network, cloud, and CVE-specific checks
47 MCP tools for LLM-driven engagements, including the new iterative trio (list_probes / run_probe / http_request) that lets Claude Code drive an engagement probe-by-probe under a hard scope guard
300+ LLM models via the LiteLLM provider (Anthropic, OpenAI, Ollama direct; Azure, OpenRouter, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere via LiteLLM)
HTTP REST API + WebSocket surface (ptai serve) for non-MCP integrations
Local web dashboard with live engagement view, findings table, attack chain visualization, SARIF export
Browser automation agent with screenshot capture, DOM analysis, network capture, security header grading (Playwright-driven)
Human-In-The-Loop teleoperation (Ctrl+C twice to take over an engagement mid-run)
MCP client capability to load external MCP servers as tool sources
Public reproducible benchmark harness in benchmarks/. Your numbers, your code, in git.
6 output formats: Markdown, HTML, PDF, SARIF 2.1.0, JUnit XML, compliance mappings (OWASP, CWE, CVE, CVSS v3.1)
1,000+ tests with CI on Python 3.10, 3.11, 3.12, and 3.13
MIT licensed, 100% yours

Who uses it for what

AppSec teams. Wire ptai into your CI. Every PR against staging gets an authenticated scan. The build fails on high-severity findings. The fix → retest → confirm loop runs on its own.

Consultants. Set up a week-long engagement, point ptai at the target list, and spend your time on the parts that need a human: analyzing findings, picking chains to demonstrate, talking to the client. The report writes itself.

Bug bounty hunters. Run it over breakfast. Come back to a list of validated findings with PoCs ready to paste into HackerOne.

Red teamers. Encode your AD methodology as a YAML playbook. Every new engagement runs it. Same methodology, shared across the team.

Developers shipping AI features. Enable --enable-llm-redteam against your chatbot. Get an OWASP LLM Top 10 report in minutes.

Responsible use

pentest-ai is offensive security tooling. It executes real network and host operations against the targets you specify. You are solely responsible for ensuring you have explicit, written authorization to test every target.

By installing or running ptai you agree to the Acceptable Use Policy and the Terms of Service. Testing systems you do not own without written authorization may violate the Computer Fraud and Abuse Act, the Computer Misuse Act 1990, GDPR Article 32, and equivalents in your jurisdiction. Misuse is your sole responsibility.

First-run prompts you to confirm AUP acceptance and persists the choice to ~/.pentest-ai/aup-consent.txt. Set PENTEST_AI_AUP_ACCEPTED=1 in CI to bypass the prompt non-interactively.

On startup ptai loads a scope file. Out-of-scope hosts are refused at tool-invocation time. PoCs are non-destructive by default. Rate limits kick in automatically in stealth mode. Don't be that person.

Ecosystem

Repo	What
pentest-ai	This repo. The CLI and MCP server. Python product.
pentest-ai-agents	Standalone Claude Code subagent markdown files. Optional, runs without this CLI.

Need shared workspaces, branded PDF reports, SSO, or a managed engagement? The website has Pro / Team / Enterprise dashboards and a one-shot Launch Engagement option. The OSS tool stays OSS, free forever.

Community

Discord: join the server. Chat, get help, share findings, lurk.
Questions, ideas, feedback: GitHub Discussions
Bug reports: GitHub Issues
Show and tell: post the wildest finding ptai gave you in Show and tell

Contributing

PRs welcome. Before you submit:

ruff check . && mypy . && pytest -q

See CONTRIBUTING.md for the full flow.

Contributors

PRs welcome. See Contributing above.

Star history

License

MIT. Do whatever you want with it.

If ptai saved you a Sunday, star the repo. It's the only payment I ask for.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0xSteve

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.17.1

May 27, 2026

0.16.4

May 26, 2026

0.16.2

May 26, 2026

0.16.1

May 26, 2026

0.16.0

May 26, 2026

0.15.3

May 20, 2026

0.15.2

May 18, 2026

0.15.1

May 16, 2026

0.15.0

May 15, 2026

This version

0.14.0

May 13, 2026

0.13.0

May 11, 2026

0.12.0

May 11, 2026

0.11.0

May 6, 2026

0.10.5

Apr 29, 2026

0.10.4

Apr 29, 2026

0.10.3

Apr 29, 2026

0.10.2

Apr 27, 2026

0.10.1

Apr 24, 2026

0.10.0

Apr 23, 2026

0.9.2

Apr 18, 2026

0.9.1

Apr 18, 2026

0.9.0

Apr 16, 2026

0.8.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptai-0.14.0.tar.gz (634.4 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ptai-0.14.0-py3-none-any.whl (493.4 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file ptai-0.14.0.tar.gz.

File metadata

Download URL: ptai-0.14.0.tar.gz
Upload date: May 13, 2026
Size: 634.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ptai-0.14.0.tar.gz
Algorithm	Hash digest
SHA256	`cdc73d3c13f5a76b3d244f0d75b3108af8ee5887dccbc96c93a9db0122fb5c24`
MD5	`21c76d8cb5b6762e62b43baa17f6a05f`
BLAKE2b-256	`7e7fc95b430cbbbeccf562b59d4d7dd93e356e8f52cfb388c92708c77ee7d15d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptai-0.14.0.tar.gz:

Publisher: release.yml on 0xSteph/pentest-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ptai-0.14.0.tar.gz
- Subject digest: cdc73d3c13f5a76b3d244f0d75b3108af8ee5887dccbc96c93a9db0122fb5c24
- Sigstore transparency entry: 1522549553
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: 0xSteph/pentest-ai@60869f0d03a77c897c0b6251d629bd6b10760139
- Branch / Tag: refs/tags/v0.14.0
- Owner: https://github.com/0xSteph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@60869f0d03a77c897c0b6251d629bd6b10760139
- Trigger Event: push

File details

Details for the file ptai-0.14.0-py3-none-any.whl.

File metadata

Download URL: ptai-0.14.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 493.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ptai-0.14.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4efaf9ce01639ebe2926168a8086cc0fe9c2fc7dfc11eed1030e6a128ec0e0ca`
MD5	`acd6aa35bfb2db99c08a15734af1b044`
BLAKE2b-256	`c8e33969ecd8c687c6340cc2a0a80ff963dcc3a7b6b3a8f980241948ac93ddaa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptai-0.14.0-py3-none-any.whl:

Publisher: release.yml on 0xSteph/pentest-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ptai-0.14.0-py3-none-any.whl
- Subject digest: 4efaf9ce01639ebe2926168a8086cc0fe9c2fc7dfc11eed1030e6a128ec0e0ca
- Sigstore transparency entry: 1522549605
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: 0xSteph/pentest-ai@60869f0d03a77c897c0b6251d629bd6b10760139
- Branch / Tag: refs/tags/v0.14.0
- Owner: https://github.com/0xSteph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@60869f0d03a77c897c0b6251d629bd6b10760139
- Trigger Event: push

ptai 0.14.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pentest-ai

See it run

Install

Interactive wizard (Claude Desktop, Cursor, VS Code Copilot)

Or use an API key

Installing tools

No LLM at all (interactive launcher)

HTTP REST API (for dashboards and integrations)

Load other MCP servers as tool sources

Take over mid-run (HITL teleoperation)

Public benchmarks

Cloud workspace (Pro / Team / Enterprise)

Why it's different

How it works

Agents

Playbooks

Drop it into your CI

vs the field

What's inside

Who uses it for what

Responsible use

Ecosystem

Community

Contributing

Contributors

Star history

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance