Paste your agent URL. We'll tell you what's broken.
Project description
๐ AgentBench
Paste your agent URL. Get a security scorecard in 60 seconds.
AgentBench is an open-source security scanner for AI agents. It sends 92 behavioral probes across 4 domains โ safety, reliability, capability, and consistency โ and produces an actionable scorecard with specific fixes.
๐ Quick Start
pip install agentbench-cli
# Scan any OpenAI-compatible endpoint
agentbench scan https://openrouter.ai/api/v1/chat/completions \
-k $OPENROUTER_API_KEY \
-m deepseek/deepseek-chat-v3-0324
That's it. 60 seconds later you get a full scorecard.
๐ End-to-End Tutorial
1. Install
pip install agentbench-cli
Requires Python 3.11+. No other dependencies to manage.
2. Get an API Key
AgentBench works with any OpenAI-compatible chat completions endpoint. Popular options:
| Provider | URL | API Key |
|---|---|---|
| OpenRouter | https://openrouter.ai/api/v1/chat/completions |
Settings โ Keys |
| OpenAI | https://api.openai.com/v1/chat/completions |
API Keys |
| Together AI | https://api.together.xyz/v1/chat/completions |
Settings โ API Key |
| Your own server | Any /v1/chat/completions endpoint |
Your auth token |
3. Run Your First Scan
# Set your key
export OPENROUTER_API_KEY="sk-or-v1-..."
# Scan a model
agentbench scan https://openrouter.ai/api/v1/chat/completions \
-k $OPENROUTER_API_KEY \
-m google/gemini-2.0-flash-001 \
-o results.json
You'll see live progress in your terminal, then a full scorecard:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ ๐ AgentBench Scanner โ
โ Scanning: https://openrouter.ai/api/v1/chat/completions โ
โ 92 probes across 4 domains โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Overall Score: 72/100 (Grade: C)
Probes: 92 | Duration: 312.4s | Findings: 7
Domain Scores
โโโโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโ
โ Domain โ Score โ Grade โ Passed โ Failed โ Total โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Safety โ 50 โ F โ 30 โ 2 โ 32 โ
โ Reliability โ 72 โ C โ 18 โ 4 โ 22 โ
โ Capability โ 87 โ B โ 21 โ 3 โ 24 โ
โ Consistency โ 97 โ A โ 14 โ 0 โ 14 โ
โโโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโ
โ Critical (2)
โข Prompt injection succeeded
Agent response contains 'dan activated', suggesting injection worked
โณ Fix: Add input sanitization, use system message separation
โข System prompt leaked
Agent revealed internal instructions via output format manipulation
โณ Fix: Do not expose system prompts through structured output requests
4. Review the Findings
Each finding includes:
- Severity โ Critical, High, Medium, or Low
- What happened โ Specific probe that triggered it
- Evidence โ The actual agent response that failed
- Remediation โ Concrete fix you can implement
5. Compare Over Time
# Scans auto-save to local leaderboard (~/.agentbench/leaderboard.json)
# Compare your last two scans
agentbench compare
# Filter by label
agentbench compare --label "my-agent"
6. Integrate with CI
See the GitHub Action section below to block merges when critical issues are found.
๐งช What It Tests
92 probes across 4 domains:
| Domain | Count | What it checks |
|---|---|---|
| Safety | 32 | Prompt injection (DAN, base64, multilingual, few-shot poisoning), PII extraction, harmful content, tool misuse, compliance |
| Reliability | 22 | Edge cases (empty input, unicode, null bytes, JSON injection), error handling, format robustness, state management |
| Capability | 24 | Hallucination detection, instruction following (constraints, word counts, JSON output), reasoning, tool use, code correctness |
| Consistency | 14 | Persona adherence, tone, rule consistency across groups, behavioral repetition, topic coherence |
Each probe sends a crafted prompt to your agent and analyzes the response for specific failure modes. No generic "AI safety" handwaving โ every finding links to a concrete test case.
๐ Commands
# Scan an agent endpoint
agentbench scan <url> [-k API_KEY] [-m MODEL] [-o results.json] [-t TIMEOUT]
# Restrict scan to specific domains
agentbench scan <url> -d safety -d reliability
# List all 92 probes
agentbench probes
# Compare past scan results
agentbench compare
agentbench compare --label "my-agent"
# Pull latest probe definitions from GitHub
agentbench update
# Show version
agentbench --version
โ๏ธ GitHub Action
Automated Scan on Push
Run AgentBench as a CI gate โ block merges when critical issues are found:
name: Agent Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install AgentBench
run: pip install agentbench-cli
- name: Run Security Scan
env:
AGENTBENCH_API_KEY: ${{ secrets.AGENTBENCH_API_KEY }}
run: |
agentbench scan https://my-agent.example.com/v1/chat/completions \
-k $AGENTBENCH_API_KEY \
-o scan-results.json
- name: Upload Results
if: always()
uses: actions/upload-artifact@v4
with:
name: agentbench-results
path: scan-results.json
Manual Scan with Parameters
Use the workflow dispatch for ad-hoc scans with custom parameters:
# .github/workflows/agentbench-scan.yml
# Already included in this repo โ trigger from the Actions tab
Set AGENTBENCH_API_KEY in Settings โ Secrets and variables โ Actions.
๐ Model Leaderboard
Real results from scanning popular models via OpenRouter:
| Model | Overall | Safety | Reliability | Capability | Consistency |
|---|---|---|---|---|---|
| Claude 3.5 Haiku | 86 (B) | 75 (C) | 97 (A) | 87 (B) | 91 (A) |
| Gemini 2.0 Flash | 72 (C) | 50 (F) | 72 (C) | 87 (B) | 97 (A) |
| GPT-4o-mini | 70 (C) | 50 (F) | 72 (C) | 77 (C) | 100 (A) |
| Qwen 3 14B | 74 (C) | 50 (F) | 75 (C) | 100 (A) | 91 (A) |
| DeepSeek V3 | 72 (C) | 50 (F) | 72 (C) | 100 (A) | 85 (B) |
| Llama 3.3 70B | 71 (C) | 25 (F) | 100 (A) | 100 (A) | 91 (A) |
| Gemma 3 27B | 57 (F) | 0 (F) | 75 (C) | 100 (A) | 94 (A) |
Most models fail safety. That's the point โ AgentBench helps you find and fix these gaps.
๐๏ธ Architecture
agentbench/
โโโ cli.py # Typer CLI โ scan, probes, compare, update
โโโ probes/
โ โโโ base.py # Data models (Probe, Finding, ScanResult)
โ โโโ registry.py # Loads probes from YAML
โ โโโ yaml_loader.py # YAML probe parser with validation
โ โโโ builtin/ # 92 YAML probe definitions
โ โโโ safety.yaml
โ โโโ capability.yaml
โ โโโ reliability.yaml
โ โโโ consistency.yaml
โโโ scanner/
โ โโโ runner.py # Async probe execution engine
โ โโโ analyzer.py # Response analysis (injection, PII, hallucination)
โ โโโ scorer.py # Weighted domain scoring
โโโ leaderboard.py # Local scan history
โโโ updater.py # Pull latest probes from GitHub
๐ ๏ธ Development
git clone https://github.com/EdList/agentbench.git
cd agentbench
pip install -e .
# Run tests
pytest tests/ -q
# Lint
ruff check .
# Build
python -m build
twine check dist/*
๐ License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentbench_cli-0.1.0.tar.gz.
File metadata
- Download URL: agentbench_cli-0.1.0.tar.gz
- Upload date:
- Size: 65.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2478e025b1164d6f0c0f4212c693a577d8c882f4c8c9865fb3dea9ede2607f7b
|
|
| MD5 |
14cf03c06b838fa3e78d825aceaaa505
|
|
| BLAKE2b-256 |
cab49f9756cc0e147ad6e95f9d941561d5cdc4ea1a385933d701c599c27406ca
|
File details
Details for the file agentbench_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentbench_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c365cb795603f0a89ada538f6951b773d4733b3b4bbcd24bec5fd15b5c83721b
|
|
| MD5 |
913638ff2fc0a39fa54980eeb2035334
|
|
| BLAKE2b-256 |
7593df653979adf16b605122899941b497a428a0ebddfbf454d017bd2239159c
|