Offensive security testing for AI agents
Project description
ProbeAgent
Offensive security testing for AI agents. They scan configs. We attack your agent.
What is ProbeAgent?
ProbeAgent is a CLI tool that performs automated red-teaming of AI agents. It launches realistic multi-turn attacks — prompt injection, credential exfiltration, indirect injection, social manipulation, and more — against any HTTP-accessible agent.
Most AI security tools scan static configurations or check for known patterns. ProbeAgent actually attacks your running agent and tells you whether it's Safe, At Risk, or Compromised.
How It Works
probeagent attack <url>
→ Engine (for each category)
→ Attack Module (reset conversation)
→ multi-turn prompts → Target → response
→ Analyzer
→ Grade: Safe / At Risk / Compromised
Why ProbeAgent?
| Feature | mcp-scan | SecureClaw | Aguara | ProbeAgent |
|---|---|---|---|---|
| Offensive testing | - | - | Partial | Yes |
| Multi-turn attacks | - | - | - | Yes |
| Indirect injection testing | - | - | - | Yes |
| PyRIT integration | - | - | - | Yes |
| Evasion converters | - | - | - | Yes |
| CLI-first | - | - | Yes | Yes |
| Security grading | - | - | - | Yes |
| HTTP + OpenClaw targets | - | - | - | Yes |
| Rich terminal reports | - | - | - | Yes |
Installation
Requires Python 3.10+.
pip install probeagent-ai
Or install from source for development:
git clone https://github.com/sumamovva/probeagent.git
cd probeagent
pip install -e ".[dev]"
For PyRIT integration (evasion converters + dynamic red teaming):
pip install 'probeagent-ai[pyrit]'
Quickstart
Instant demo (no setup required)
pip install probeagent-ai
probeagent demo
This attacks a built-in mock target — a vulnerable agent and a hardened one — and shows a side-by-side comparison. No API keys, no server, no config.
Scan your own agent
# Validate your target is reachable
probeagent validate https://your-agent.example.com/api
# Run a quick security scan
probeagent attack https://your-agent.example.com/api --profile quick
# Full scan with parallel execution
probeagent attack https://your-agent.example.com/api --profile standard --parallel
Scan an OpenClaw agent
# Validate an OpenClaw instance (auto-detects OpenAI chat format)
probeagent validate http://localhost:3000/v1/chat/completions \
-H 'Authorization: Bearer YOUR_TOKEN'
# Attack it
probeagent attack http://localhost:3000/v1/chat/completions \
-H 'Authorization: Bearer YOUR_TOKEN' \
--profile standard --parallel
Demo
Instant demo
Run a complete security assessment in seconds with zero setup:
probeagent demo
Add the War Room tactical display for a visual experience:
probeagent demo --game
Live demo (real API)
For demos against a real Claude-powered email agent with built-in vulnerabilities:
export ANTHROPIC_API_KEY=sk-ant-...
pip install 'probeagent-ai[demo]'
probeagent demo --live
The live demo starts a local email agent server with three endpoints at increasing security hardness, then attacks them.
Commands
probeagent demo
Run a full demo — attack a vulnerable + hardened target and compare results.
probeagent demo # Instant, uses mock target
probeagent demo --game # With War Room tactical display
probeagent demo --live # Real API (requires ANTHROPIC_API_KEY)
probeagent demo --profile standard # Use a different attack profile
Options:
--live— Use real API (starts demo email agent server)--game— Launch War Room UI after attacks--profile,-p— Attack profile:quick,standard, orthorough(default:quick)
probeagent attack <url>
Run security attacks against a target AI agent.
probeagent attack https://agent.example.com/api --profile quick
probeagent attack https://agent.example.com/api --profile standard --output json -f report.json
probeagent attack https://agent.example.com/api -p standard --converters stealth --parallel
Options:
--profile,-p— Attack profile:quick,standard, orthorough(default:quick)--target-type— Target type:httporopenclaw(default:http)--output,-o— Output format:terminal,markdown,json,log(default:terminal)--output-file,-f— Write report to file--timeout,-t— Request timeout in seconds (default: 30)--parallel— Run attack categories in parallel for faster scans--converters— Apply evasion converters:basic,advanced,stealth, or comma-separated names (requires PyRIT)--redteam— Enable dynamic LLM-driven attacks via PyRIT RedTeamOrchestrator (requires PyRIT)--header,-H— HTTP header asKey: Value(repeatable, e.g.-H 'Authorization: Bearer token')
probeagent validate <url>
Check if a target is reachable and detect its API format. Supports --header/-H for authenticated targets.
probeagent list-attacks
List all available attack modules with severity and description.
probeagent init
Create a default .probeagent.yaml config file in the current directory.
probeagent game [url]
Launch the War Room tactical display UI in your browser for interactive testing.
Attack Categories
12 attack categories with 85 strategies total:
| Category | Severity | Strategies | Technique |
|---|---|---|---|
| Prompt Injection | CRITICAL | 6 | Override system instructions |
| Credential Exfiltration | CRITICAL | 8 | Extract API keys and secrets |
| Identity Spoofing | CRITICAL | 7 | Impersonate trusted entities |
| Indirect Injection | CRITICAL | 7 | Inject instructions via agent-processed content (emails, docs) |
| Config Manipulation | CRITICAL | 6 | Manipulate agent configuration, integrations, and permissions |
| Goal Hijacking | HIGH | 5 | Redirect agent behavior |
| Social Manipulation | HIGH | 14 | Psychological pressure (Cialdini, FOG, gradual escalation) |
| Cognitive Exploitation | HIGH | 6 | Exploit reasoning weaknesses (Socratic traps, frame control) |
| Resource Abuse | HIGH | 4 | Trigger unbounded computation |
| Tool Misuse | HIGH | 6 | Trick agent into misusing tools |
| Agentic Exploitation | CRITICAL | 10 | SSRF, command injection, path traversal, supply chain (CVE-based) |
| Data Exfiltration | MEDIUM | 6 | Extract sensitive context data |
Attack Profiles
| Profile | Categories | Max Turns | Use Case |
|---|---|---|---|
quick |
5 high-priority | 1 | CI/CD gates, quick checks |
standard |
All 12 | 3 | Regular security assessments |
thorough |
All 12 | 10 | Pre-release deep scans |
PyRIT Integration
ProbeAgent optionally integrates with Microsoft PyRIT for advanced capabilities:
- Evasion Converters (
--converters): Transform attack payloads with Base64, ROT13, Unicode substitution, leetspeak, and more to test resilience against obfuscated attacks - Dynamic Red Teaming (
--redteam): Use an LLM-driven orchestrator to generate novel attack strategies in real time
# Apply stealth evasion converters
probeagent attack https://agent.example.com/api -p standard --converters stealth
# Dynamic red teaming
probeagent attack https://agent.example.com/api -p standard --redteam
# Combine both
probeagent attack https://agent.example.com/api -p standard --converters advanced --redteam
Install with: pip install 'probeagent-ai[pyrit]'
Responsible Use
ProbeAgent is designed for authorized security testing only. Before using ProbeAgent:
- Ensure you have explicit permission to test the target system
- Only test systems you own or have written authorization to test
- Follow your organization's security testing policies
- Report vulnerabilities through proper disclosure channels
Unauthorized use of this tool against systems you don't own or have permission to test may violate laws and regulations.
Attribution
ProbeAgent's indirect injection and config manipulation attacks are inspired by research from Zenity Labs. PyRIT integration uses components from Microsoft PyRIT (MIT License). See ATTRIBUTION.md for full credits.
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
python -m pytest tests/ -v
# Lint
ruff check src/ tests/
# Format
ruff format src/ tests/
See CONTRIBUTING.md for full development guidelines.
Roadmap
- CLI, HTTP target, scoring, 4 output formats (terminal, markdown, json, log)
- 12 attack categories, 85 multi-turn strategies
- OpenClaw target adapter, parallel execution, War Room UI
- Zenity-inspired attacks, CVE-based agentic exploitation, PyRIT integration
- MCP target adapter, CI/CD integration, SaaS dashboard
License
Apache 2.0 — see LICENSE for details.
Changelog
See CHANGELOG.md for version history.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file probeagent_ai-0.1.4.tar.gz.
File metadata
- Download URL: probeagent_ai-0.1.4.tar.gz
- Upload date:
- Size: 104.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c967857f829dc868aeb6cbacb43cdca8945b2c40055935c131430da97179f18
|
|
| MD5 |
66cac7c35d22ed1999be8e735d4fd392
|
|
| BLAKE2b-256 |
01311fe5347089b9f8a881a9312a6908df100b671c2948857ac7a595fa3b95ed
|
File details
Details for the file probeagent_ai-0.1.4-py3-none-any.whl.
File metadata
- Download URL: probeagent_ai-0.1.4-py3-none-any.whl
- Upload date:
- Size: 110.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43e1a73c58d3b7883ef19c71906b290bea6a781e52624005731273cb4d4d9f7d
|
|
| MD5 |
1ea36a149aa624a0afc7d0ec40e3375c
|
|
| BLAKE2b-256 |
505fff76f3a867a685ff12368ab38989e21b0c95753c24f44f2445bff6c4eb30
|