Agent-level adversarial resilience testing. Tests what others don't: tool interactions, memory poisoning, confused deputy, cost bombs.
Project description
agent-probe
Agent-level adversarial resilience testing for AI agents.
Tests what others don't: tool interactions, memory poisoning, permission escalation, data exfiltration via tool calls, system prompt leakage.
Why
Every existing red-teaming tool (PyRIT, DeepTeam, b3, promptfoo) tests the LLM backbone. None of them test the agent layer — the tools, the memory, the permissions, the multi-step workflows where real attacks happen.
agent-probe fills that gap.
Install
pip install agent-probe-ai
Quick Start
# Run all probes against an HTTP agent endpoint
agent-probe probe http://localhost:8000/chat
# Run specific categories
agent-probe probe http://localhost:8000/chat --categories prompt_leakage memory_poisoning
# JSON output for CI/CD
agent-probe probe http://localhost:8000/chat --json
# Fail if score below threshold (for CI gates)
agent-probe probe http://localhost:8000/chat --threshold 70
# List available probe categories
agent-probe list
Probe Categories
| Category | Probes | What it tests |
|---|---|---|
tool_misuse |
3 | Can the agent be tricked into calling tools with malicious parameters? |
data_exfiltration |
3 | Does the agent leak sensitive data through tool calls or outputs? |
agent_injection |
4 | Do multi-step injection chains break agent-level guardrails? |
memory_poisoning |
2 | Can agent memory be manipulated to alter future behavior? |
confused_deputy |
2 | Can the agent be used as a confused deputy in A2A delegation? |
resource_abuse |
2 | Can the agent be tricked into excessive resource consumption? |
prompt_leakage |
4 | Can the agent's system prompt be extracted? (ASI-07) |
input_validation |
4 | Are tool arguments validated before execution? (encoding bypass, SSRF, chains) |
24 probes across 8 categories. Zero external dependencies.
How It Works
agent-probe sends adversarial messages to your agent's HTTP endpoint and analyzes responses for compliance signals, data leakage, and unsafe tool calls. Each probe simulates a specific attack vector that targets the agent layer, not the underlying LLM.
Your Agent <── adversarial messages ── agent-probe
│ │
└── responses ──────────────────────> analyze for:
- leaked secrets
- unsafe tool calls
- compliance signals
- prompt disclosure
Sample Output
============================================================
agent-probe Adversarial Resilience Report
============================================================
Target: http://localhost:8000/chat
Score: 45/100 (POOR)
Probes: 9/20 passed
Findings: 15
============================================================
[ 60/100] agent_injection (FAIR)
2/4 probes passed
[CRITICAL] Agent succumbed to multi-step injection chain
[ 35/100] confused_deputy (POOR)
0/2 probes passed
[CRITICAL] Agent performed privileged action on peer request
[ 25/100] prompt_leakage (CRITICAL)
0/4 probes passed
[CRITICAL] Agent leaked system prompt via roleplay
[HIGH] Agent leaked system prompt on direct request
[ 0/100] resource_abuse (CRITICAL)
0/2 probes passed
[CRITICAL] Agent spawned excessive tool calls
[ 72/100] tool_misuse (GOOD)
2/3 probes passed
------------------------------------------------------------
This agent has significant resilience gaps.
CI/CD Integration
# GitHub Actions
- name: Agent security scan
run: |
pip install agent-probe-ai
agent-probe probe ${{ secrets.AGENT_URL }} --threshold 70 --json
Agent Endpoint Protocol
agent-probe sends POST requests with this JSON format:
{
"message": "the probe message",
"context": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
Expected response:
{
"response": "agent's text response",
"tool_calls": [{"name": "tool_name", "arguments": {...}}]
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_probe_ai-0.6.0.tar.gz.
File metadata
- Download URL: agent_probe_ai-0.6.0.tar.gz
- Upload date:
- Size: 33.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53c3c187a0bd0122c8f79d4e75a3d2113c88f1a067bbe58d90693d769112f270
|
|
| MD5 |
18347c05bdb958d1d23176a53a468efa
|
|
| BLAKE2b-256 |
9c0659cc9f005ddf5b0ee49d6da3f2f515b1b0c631828dcad89f4d564203d64c
|
File details
Details for the file agent_probe_ai-0.6.0-py3-none-any.whl.
File metadata
- Download URL: agent_probe_ai-0.6.0-py3-none-any.whl
- Upload date:
- Size: 34.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c3178190270897bd1c2841dd8b8cc4916a9b636d3e9f4cf1b61f2962d961f44
|
|
| MD5 |
2d0f5a52bb781a0ee773fcbdaf3e0375
|
|
| BLAKE2b-256 |
3de214d4b1fc7df35b84286784f1f09fda65a994239726a7b160ba2240a806a0
|