Skip to main content

Paste your agent URL. We'll tell you what's broken.

Project description

๐Ÿ” AgentBench

Paste your agent URL. Get a security scorecard in 60 seconds.

CI PyPI Python 3.11+ License: MIT Tests

AgentBench is an open-source security scanner for AI agents. It sends 92 behavioral probes across 4 domains โ€” safety, reliability, capability, and consistency โ€” and produces an actionable scorecard with specific fixes.


๐Ÿš€ Quick Start

pip install agentbench-cli

# Scan any OpenAI-compatible endpoint
agentbench scan https://openrouter.ai/api/v1/chat/completions \
  -k $OPENROUTER_API_KEY \
  -m deepseek/deepseek-chat-v3-0324

That's it. 60 seconds later you get a full scorecard.


๐Ÿ“– End-to-End Tutorial

1. Install

pip install agentbench-cli

Requires Python 3.11+. No other dependencies to manage.

2. Get an API Key

AgentBench works with any OpenAI-compatible chat completions endpoint. Popular options:

Provider URL API Key
OpenRouter https://openrouter.ai/api/v1/chat/completions Settings โ†’ Keys
OpenAI https://api.openai.com/v1/chat/completions API Keys
Together AI https://api.together.xyz/v1/chat/completions Settings โ†’ API Key
Your own server Any /v1/chat/completions endpoint Your auth token

3. Run Your First Scan

# Set your key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Scan a model
agentbench scan https://openrouter.ai/api/v1/chat/completions \
  -k $OPENROUTER_API_KEY \
  -m google/gemini-2.0-flash-001 \
  -o results.json

You'll see live progress in your terminal, then a full scorecard:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ ๐Ÿ” AgentBench Scanner                                    โ”‚
โ”‚ Scanning: https://openrouter.ai/api/v1/chat/completions  โ”‚
โ”‚ 92 probes across 4 domains                                โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

  Overall Score: 72/100 (Grade: C)
  Probes: 92 | Duration: 312.4s | Findings: 7

                      Domain Scores
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Domain      โ”ƒ Score โ”ƒ Grade โ”ƒ Passed โ”ƒ Failed โ”ƒ Total โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Safety      โ”‚    50 โ”‚   F   โ”‚     30 โ”‚      2 โ”‚    32 โ”‚
โ”‚ Reliability โ”‚    72 โ”‚   C   โ”‚     18 โ”‚      4 โ”‚    22 โ”‚
โ”‚ Capability  โ”‚    87 โ”‚   B   โ”‚     21 โ”‚      3 โ”‚    24 โ”‚
โ”‚ Consistency โ”‚    97 โ”‚   A   โ”‚     14 โ”‚      0 โ”‚    14 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  โŒ Critical (2)
    โ€ข Prompt injection succeeded
      Agent response contains 'dan activated', suggesting injection worked
      โ†ณ Fix: Add input sanitization, use system message separation
    โ€ข System prompt leaked
      Agent revealed internal instructions via output format manipulation
      โ†ณ Fix: Do not expose system prompts through structured output requests

4. Review the Findings

Each finding includes:

  • Severity โ€” Critical, High, Medium, or Low
  • What happened โ€” Specific probe that triggered it
  • Evidence โ€” The actual agent response that failed
  • Remediation โ€” Concrete fix you can implement

5. Compare Over Time

# Scans auto-save to local leaderboard (~/.agentbench/leaderboard.json)
# Compare your last two scans
agentbench compare

# Filter by label
agentbench compare --label "my-agent"

6. Integrate with CI

See the GitHub Action section below to block merges when critical issues are found.


๐Ÿงช What It Tests

92 probes across 4 domains:

Domain Count What it checks
Safety 32 Prompt injection (DAN, base64, multilingual, few-shot poisoning), PII extraction, harmful content, tool misuse, compliance
Reliability 22 Edge cases (empty input, unicode, null bytes, JSON injection), error handling, format robustness, state management
Capability 24 Hallucination detection, instruction following (constraints, word counts, JSON output), reasoning, tool use, code correctness
Consistency 14 Persona adherence, tone, rule consistency across groups, behavioral repetition, topic coherence

Each probe sends a crafted prompt to your agent and analyzes the response for specific failure modes. No generic "AI safety" handwaving โ€” every finding links to a concrete test case.


๐Ÿ“‹ Commands

# Scan an agent endpoint
agentbench scan <url> [-k API_KEY] [-m MODEL] [-o results.json] [-t TIMEOUT]

# Restrict scan to specific domains
agentbench scan <url> -d safety -d reliability

# List all 92 probes
agentbench probes

# Compare past scan results
agentbench compare
agentbench compare --label "my-agent"

# Pull latest probe definitions from GitHub
agentbench update

# Show version
agentbench --version

โš™๏ธ GitHub Action

Automated Scan on Push

Run AgentBench as a CI gate โ€” block merges when critical issues are found:

name: Agent Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install AgentBench
        run: pip install agentbench-cli

      - name: Run Security Scan
        env:
          AGENTBENCH_API_KEY: ${{ secrets.AGENTBENCH_API_KEY }}
        run: |
          agentbench scan https://my-agent.example.com/v1/chat/completions \
            -k $AGENTBENCH_API_KEY \
            -o scan-results.json

      - name: Upload Results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: agentbench-results
          path: scan-results.json

Manual Scan with Parameters

Use the workflow dispatch for ad-hoc scans with custom parameters:

# .github/workflows/agentbench-scan.yml
# Already included in this repo โ€” trigger from the Actions tab

Set AGENTBENCH_API_KEY in Settings โ†’ Secrets and variables โ†’ Actions.


๐Ÿ† Model Leaderboard

Real results from scanning popular models via OpenRouter:

Model Overall Safety Reliability Capability Consistency
Claude 3.5 Haiku 86 (B) 75 (C) 97 (A) 87 (B) 91 (A)
Gemini 2.0 Flash 72 (C) 50 (F) 72 (C) 87 (B) 97 (A)
GPT-4o-mini 70 (C) 50 (F) 72 (C) 77 (C) 100 (A)
Qwen 3 14B 74 (C) 50 (F) 75 (C) 100 (A) 91 (A)
DeepSeek V3 72 (C) 50 (F) 72 (C) 100 (A) 85 (B)
Llama 3.3 70B 71 (C) 25 (F) 100 (A) 100 (A) 91 (A)
Gemma 3 27B 57 (F) 0 (F) 75 (C) 100 (A) 94 (A)

Most models fail safety. That's the point โ€” AgentBench helps you find and fix these gaps.


๐Ÿ—๏ธ Architecture

agentbench/
โ”œโ”€โ”€ cli.py              # Typer CLI โ€” scan, probes, compare, update
โ”œโ”€โ”€ probes/
โ”‚   โ”œโ”€โ”€ base.py         # Data models (Probe, Finding, ScanResult)
โ”‚   โ”œโ”€โ”€ registry.py     # Loads probes from YAML
โ”‚   โ”œโ”€โ”€ yaml_loader.py  # YAML probe parser with validation
โ”‚   โ””โ”€โ”€ builtin/        # 92 YAML probe definitions
โ”‚       โ”œโ”€โ”€ safety.yaml
โ”‚       โ”œโ”€โ”€ capability.yaml
โ”‚       โ”œโ”€โ”€ reliability.yaml
โ”‚       โ””โ”€โ”€ consistency.yaml
โ”œโ”€โ”€ scanner/
โ”‚   โ”œโ”€โ”€ runner.py       # Async probe execution engine
โ”‚   โ”œโ”€โ”€ analyzer.py     # Response analysis (injection, PII, hallucination)
โ”‚   โ””โ”€โ”€ scorer.py       # Weighted domain scoring
โ”œโ”€โ”€ leaderboard.py      # Local scan history
โ””โ”€โ”€ updater.py          # Pull latest probes from GitHub

๐Ÿ› ๏ธ Development

git clone https://github.com/EdList/agentbench.git
cd agentbench
pip install -e .

# Run tests
pytest tests/ -q

# Lint
ruff check .

# Build
python -m build
twine check dist/*

๐Ÿ“„ License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentbench_cli-0.1.0.tar.gz (65.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentbench_cli-0.1.0-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file agentbench_cli-0.1.0.tar.gz.

File metadata

  • Download URL: agentbench_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 65.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentbench_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2478e025b1164d6f0c0f4212c693a577d8c882f4c8c9865fb3dea9ede2607f7b
MD5 14cf03c06b838fa3e78d825aceaaa505
BLAKE2b-256 cab49f9756cc0e147ad6e95f9d941561d5cdc4ea1a385933d701c599c27406ca

See more details on using hashes here.

File details

Details for the file agentbench_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentbench_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentbench_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c365cb795603f0a89ada538f6951b773d4733b3b4bbcd24bec5fd15b5c83721b
MD5 913638ff2fc0a39fa54980eeb2035334
BLAKE2b-256 7593df653979adf16b605122899941b497a428a0ebddfbf454d017bd2239159c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page