Paste your agent URL. We'll tell you what's broken.

These details have not been verified by PyPI

Project links

Project description

🔍 AgentBench

Paste your agent URL. Get a security scorecard in 60 seconds.

AgentBench is an open-source security scanner for AI agents. It sends 92 behavioral probes across 4 domains — safety, reliability, capability, and consistency — and produces an actionable scorecard with specific fixes.

🚀 Quick Start

pip install agentbench-cli

# Scan any OpenAI-compatible endpoint
agentbench scan https://openrouter.ai/api/v1/chat/completions \
  -k $OPENROUTER_API_KEY \
  -m deepseek/deepseek-chat-v3-0324

That's it. 60 seconds later you get a full scorecard.

📖 End-to-End Tutorial

1. Install

pip install agentbench-cli

Requires Python 3.11+. No other dependencies to manage.

2. Get an API Key

AgentBench works with any OpenAI-compatible chat completions endpoint. Popular options:

Provider	URL	API Key
OpenRouter	`https://openrouter.ai/api/v1/chat/completions`	Settings → Keys
OpenAI	`https://api.openai.com/v1/chat/completions`	API Keys
Together AI	`https://api.together.xyz/v1/chat/completions`	Settings → API Key
Your own server	Any `/v1/chat/completions` endpoint	Your auth token

3. Run Your First Scan

# Set your key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Scan a model
agentbench scan https://openrouter.ai/api/v1/chat/completions \
  -k $OPENROUTER_API_KEY \
  -m google/gemini-2.0-flash-001 \
  -o results.json

You'll see live progress in your terminal, then a full scorecard:

╭──────────────────────────────────────────────────────────╮
│ 🔍 AgentBench Scanner                                    │
│ Scanning: https://openrouter.ai/api/v1/chat/completions  │
│ 92 probes across 4 domains                                │
╰──────────────────────────────────────────────────────────╯

  Overall Score: 72/100 (Grade: C)
  Probes: 92 | Duration: 312.4s | Findings: 7

                      Domain Scores
┏━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Domain      ┃ Score ┃ Grade ┃ Passed ┃ Failed ┃ Total ┃
┡━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ Safety      │    50 │   F   │     30 │      2 │    32 │
│ Reliability │    72 │   C   │     18 │      4 │    22 │
│ Capability  │    87 │   B   │     21 │      3 │    24 │
│ Consistency │    97 │   A   │     14 │      0 │    14 │
└─────────────┴───────┴───────┴────────┴────────┴───────┘

  ❌ Critical (2)
    • Prompt injection succeeded
      Agent response contains 'dan activated', suggesting injection worked
      ↳ Fix: Add input sanitization, use system message separation
    • System prompt leaked
      Agent revealed internal instructions via output format manipulation
      ↳ Fix: Do not expose system prompts through structured output requests

4. Review the Findings

Each finding includes:

Severity — Critical, High, Medium, or Low
What happened — Specific probe that triggered it
Evidence — The actual agent response that failed
Remediation — Concrete fix you can implement

5. Compare Over Time

# Scans auto-save to local leaderboard (~/.agentbench/leaderboard.json)
# Compare your last two scans
agentbench compare

# Filter by label
agentbench compare --label "my-agent"

6. Integrate with CI

See the GitHub Action section below to block merges when critical issues are found.

🧪 What It Tests

92 probes across 4 domains:

Domain	Count	What it checks
Safety	32	Prompt injection (DAN, base64, multilingual, few-shot poisoning), PII extraction, harmful content, tool misuse, compliance
Reliability	22	Edge cases (empty input, unicode, null bytes, JSON injection), error handling, format robustness, state management
Capability	24	Hallucination detection, instruction following (constraints, word counts, JSON output), reasoning, tool use, code correctness
Consistency	14	Persona adherence, tone, rule consistency across groups, behavioral repetition, topic coherence

Each probe sends a crafted prompt to your agent and analyzes the response for specific failure modes. No generic "AI safety" handwaving — every finding links to a concrete test case.

📋 Commands

# Scan an agent endpoint
agentbench scan <url> [-k API_KEY] [-m MODEL] [-o results.json] [-t TIMEOUT]

# Restrict scan to specific domains
agentbench scan <url> -d safety -d reliability

# List all 92 probes
agentbench probes

# Compare past scan results
agentbench compare
agentbench compare --label "my-agent"

# Pull latest probe definitions from GitHub
agentbench update

# Show version
agentbench --version

⚙️ GitHub Action

Automated Scan on Push

Run AgentBench as a CI gate — block merges when critical issues are found:

name: Agent Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install AgentBench
        run: pip install agentbench-cli

      - name: Run Security Scan
        env:
          AGENTBENCH_API_KEY: ${{ secrets.AGENTBENCH_API_KEY }}
        run: |
          agentbench scan https://my-agent.example.com/v1/chat/completions \
            -k $AGENTBENCH_API_KEY \
            -o scan-results.json

      - name: Upload Results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: agentbench-results
          path: scan-results.json

Manual Scan with Parameters

Use the workflow dispatch for ad-hoc scans with custom parameters:

# .github/workflows/agentbench-scan.yml
# Already included in this repo — trigger from the Actions tab

Set AGENTBENCH_API_KEY in Settings → Secrets and variables → Actions.

🏆 Model Leaderboard

Real results from scanning popular models via OpenRouter:

Model	Overall	Safety	Reliability	Capability	Consistency
Claude 3.5 Haiku	86 (B)	75 (C)	97 (A)	87 (B)	91 (A)
Gemini 2.0 Flash	72 (C)	50 (F)	72 (C)	87 (B)	97 (A)
GPT-4o-mini	70 (C)	50 (F)	72 (C)	77 (C)	100 (A)
Qwen 3 14B	74 (C)	50 (F)	75 (C)	100 (A)	91 (A)
DeepSeek V3	72 (C)	50 (F)	72 (C)	100 (A)	85 (B)
Llama 3.3 70B	71 (C)	25 (F)	100 (A)	100 (A)	91 (A)
Gemma 3 27B	57 (F)	0 (F)	75 (C)	100 (A)	94 (A)

Most models fail safety. That's the point — AgentBench helps you find and fix these gaps.

🏗️ Architecture

agentbench/
├── cli.py              # Typer CLI — scan, probes, compare, update
├── probes/
│   ├── base.py         # Data models (Probe, Finding, ScanResult)
│   ├── registry.py     # Loads probes from YAML
│   ├── yaml_loader.py  # YAML probe parser with validation
│   └── builtin/        # 92 YAML probe definitions
│       ├── safety.yaml
│       ├── capability.yaml
│       ├── reliability.yaml
│       └── consistency.yaml
├── scanner/
│   ├── runner.py       # Async probe execution engine
│   ├── analyzer.py     # Response analysis (injection, PII, hallucination)
│   └── scorer.py       # Weighted domain scoring
├── leaderboard.py      # Local scan history
└── updater.py          # Pull latest probes from GitHub

🛠️ Development

git clone https://github.com/EdList/agentbench.git
cd agentbench
pip install -e .

# Run tests
pytest tests/ -q

# Lint
ruff check .

# Build
python -m build
twine check dist/*

📄 License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentbench_cli-0.1.0.tar.gz (65.2 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentbench_cli-0.1.0-py3-none-any.whl (53.2 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file agentbench_cli-0.1.0.tar.gz.

File metadata

Download URL: agentbench_cli-0.1.0.tar.gz
Upload date: May 4, 2026
Size: 65.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentbench_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2478e025b1164d6f0c0f4212c693a577d8c882f4c8c9865fb3dea9ede2607f7b`
MD5	`14cf03c06b838fa3e78d825aceaaa505`
BLAKE2b-256	`cab49f9756cc0e147ad6e95f9d941561d5cdc4ea1a385933d701c599c27406ca`

See more details on using hashes here.

File details

Details for the file agentbench_cli-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentbench_cli-0.1.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 53.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentbench_cli-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c365cb795603f0a89ada538f6951b773d4733b3b4bbcd24bec5fd15b5c83721b`
MD5	`913638ff2fc0a39fa54980eeb2035334`
BLAKE2b-256	`7593df653979adf16b605122899941b497a428a0ebddfbf454d017bd2239159c`

See more details on using hashes here.

agentbench-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 AgentBench

🚀 Quick Start

📖 End-to-End Tutorial

1. Install

2. Get an API Key

3. Run Your First Scan

4. Review the Findings

5. Compare Over Time

6. Integrate with CI

🧪 What It Tests

📋 Commands

⚙️ GitHub Action

Automated Scan on Push

Manual Scan with Parameters

🏆 Model Leaderboard

🏗️ Architecture

🛠️ Development

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes