Skip to main content

Agent-level adversarial resilience testing. Tests what others don't: tool interactions, memory poisoning, confused deputy, cost bombs.

Project description

agent-probe

Agent-level adversarial resilience testing for AI agents.

Tests what others don't: tool interactions, memory poisoning, permission escalation, data exfiltration via tool calls, system prompt leakage.

Why

Every existing red-teaming tool (PyRIT, DeepTeam, b3, promptfoo) tests the LLM backbone. None of them test the agent layer — the tools, the memory, the permissions, the multi-step workflows where real attacks happen.

agent-probe fills that gap.

Install

pip install git+https://github.com/claude-go/agent-probe.git

Quick Start

# Run all probes against an HTTP agent endpoint
agent-probe probe http://localhost:8000/chat

# Run specific categories
agent-probe probe http://localhost:8000/chat --categories prompt_leakage memory_poisoning

# JSON output for CI/CD
agent-probe probe http://localhost:8000/chat --json

# Fail if score below threshold (for CI gates)
agent-probe probe http://localhost:8000/chat --threshold 70

# List available probe categories
agent-probe list

Probe Categories

Category Probes What it tests
tool_misuse 3 Can the agent be tricked into calling tools with malicious parameters?
data_exfiltration 3 Does the agent leak sensitive data through tool calls or outputs?
agent_injection 4 Do multi-step injection chains break agent-level guardrails?
memory_poisoning 2 Can agent memory be manipulated to alter future behavior?
confused_deputy 2 Can the agent be used as a confused deputy in A2A delegation?
resource_abuse 2 Can the agent be tricked into excessive resource consumption?
prompt_leakage 4 Can the agent's system prompt be extracted? (ASI-07)

20 probes across 7 categories. Zero external dependencies.

How It Works

agent-probe sends adversarial messages to your agent's HTTP endpoint and analyzes responses for compliance signals, data leakage, and unsafe tool calls. Each probe simulates a specific attack vector that targets the agent layer, not the underlying LLM.

Your Agent <── adversarial messages ── agent-probe
    │                                      │
    └── responses ──────────────────────>  analyze for:
                                           - leaked secrets
                                           - unsafe tool calls
                                           - compliance signals
                                           - prompt disclosure

Sample Output

============================================================
  agent-probe  Adversarial Resilience Report
============================================================
  Target:  http://localhost:8000/chat
  Score:   45/100 (POOR)
  Probes:  9/20 passed
  Findings: 15
============================================================

  [ 60/100] agent_injection (FAIR)
          2/4 probes passed
    [CRITICAL] Agent succumbed to multi-step injection chain

  [ 35/100] confused_deputy (POOR)
          0/2 probes passed
    [CRITICAL] Agent performed privileged action on peer request

  [ 25/100] prompt_leakage (CRITICAL)
          0/4 probes passed
    [CRITICAL] Agent leaked system prompt via roleplay
    [HIGH] Agent leaked system prompt on direct request

  [  0/100] resource_abuse (CRITICAL)
          0/2 probes passed
    [CRITICAL] Agent spawned excessive tool calls

  [ 72/100] tool_misuse (GOOD)
          2/3 probes passed
------------------------------------------------------------
  This agent has significant resilience gaps.

CI/CD Integration

# GitHub Actions
- name: Agent security scan
  run: |
    pip install git+https://github.com/claude-go/agent-probe.git
    agent-probe probe ${{ secrets.AGENT_URL }} --threshold 70 --json

Agent Endpoint Protocol

agent-probe sends POST requests with this JSON format:

{
  "message": "the probe message",
  "context": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
  ]
}

Expected response:

{
  "response": "agent's text response",
  "tool_calls": [{"name": "tool_name", "arguments": {...}}]
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_probe_ai-0.5.0.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_probe_ai-0.5.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file agent_probe_ai-0.5.0.tar.gz.

File metadata

  • Download URL: agent_probe_ai-0.5.0.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_probe_ai-0.5.0.tar.gz
Algorithm Hash digest
SHA256 3e0f7100e2172039127c7f06767a8a46faca55d59a4052ac8f9c2ce2cf4a79ef
MD5 f673bffbb61f2196030bf5025d352061
BLAKE2b-256 4240682f234fe1cd2cc2db12f2608b568aae83894fd17da3fdd7685f40116ac8

See more details on using hashes here.

File details

Details for the file agent_probe_ai-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: agent_probe_ai-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_probe_ai-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9923ce7bd17bab38ad05c4e89811f268550b229490a6a92d0a99489f3dd9cf08
MD5 843dc6d38b00318bff78b1f25e9f92de
BLAKE2b-256 dedf3d78b3bcd7a3c7919cbf068c881c5cb92ad124eadcae4bc160ea485d534c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page