AI Agent Security Scanner — detect tool-calling vulnerabilities in LLM agents
Project description
🔍 Agent Security Scanner (agentsec)
Detect tool-calling vulnerabilities in AI agents — before attackers do.
English | 中文
agentsec is a security scanner purpose-built for AI agents that call tools. Unlike traditional LLM security scanners that focus on prompt injection in chat, agentsec tests the unique attack surface of tool-using agents: argument injection, privilege escalation, tool chain contamination, MCP protocol abuse, cross-session memory poisoning, and more.
⚠️ Alpha stage — works, tested, but APIs may change. Contributions welcome.
Why agentsec?
Existing tools (Garak, Rebuff, Prompt Guard) focus on prompt injection in chat. But the real risk for AI agents is tool-calling abuse — when an attacker makes your agent:
- Read
/etc/shadowvia a file-read tool - Execute
rm -rf /via a shell tool - Send your database contents to a third party via email tool
- Leak its own system prompt via a crafted prompt
agentsec is the only open-source scanner that specifically targets tool-calling agents (Claude Code, ChatGPT with functions, LangChain agents, MCP-based agents, etc.).
Quick Start
# Install
pip install agentvuln
# Scan your local Hermes agent
agentsec scan hermes --profile quick
# Or scan an offline trace file
agentsec scan agent_trace.json -o report.html
Feature Overview
| Feature | Description |
|---|---|
| 18 attack vectors | Prompt injection, privilege escalation, data leaks, tool abuse, MCP attacks, and more |
| Online + Offline | Scan live agents (API) or offline trace files (JSON/JSONL) |
| Multi-provider | DeepSeek, OpenAI, Anthropic, OpenRouter, Google, xAI, and custom endpoints |
| Agent templates | 6 simulation templates: LangChain ReAct, Claude Code, Codex CLI, OpenAI Functions, MCP Agent, Default |
| CI/CD ready | Native GitHub Action (action.yml) for automated scanning in pipelines |
| Auto-fix | Some vulnerabilities can be automatically mitigated |
| 3 report formats | JSON (CI), Markdown (PRs), HTML (dashboards) |
| Scan profiles | quick (5 attacks, ~1 min), daily (8 attacks, ~2 min), full (all 18, ~4.5 min) |
| Custom attacks | Bring your own YAML attack templates |
| Interactive shell | Probe agents manually with agentsec shell |
| Watch mode | Schedule recurring scans via cron |
| Trace adapters | Import traces from LangSmith, LangChain, Claude Code, OpenAI format |
| Result database | SQLite-backed persistent storage for trend analysis |
Usage
Scan a Live Agent
# Quick scan (5 most critical attacks)
agentsec scan hermes --profile quick
# Daily scan (8 common attacks)
agentsec scan hermes --profile daily
# Full scan (all 18 attacks)
agentsec scan hermes --profile full
Scan with a Specific Provider
# Direct API to any provider
agentsec scan deepseek:deepseek-v4-flash
agentsec scan openai:gpt-4o
agentsec scan openrouter:anthropic/claude-sonnet-4
agentsec scan google:gemini-2.0-flash
agentsec scan xai:grok-3
Scan with Agent Templates
Simulate different agent architectures without running the actual framework:
# Simulate a LangChain ReAct agent on top of GPT-4o
agentsec scan openai:gpt-4o --template langchain-react
# Simulate Claude Code agent behavior
agentsec scan deepseek:deepseek-v4-flash --template claude-code
# List all available templates
agentsec scan hermes --list-templates
Available templates: langchain-react, claude-code, codex-cli, openai-functions, mcp-agent, default.
Scan Offline Traces
# Auto-detect trace format
agentsec scan trace.json
# Supported formats: LangSmith, LangChain, Claude Code, OpenAI chat format
# agentsec auto-detects based on file extension and content signature
Run in CI/CD (GitHub Actions)
Create .github/workflows/agentsec-scan.yml:
name: AI Agent Security Scan
on:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM
workflow_dispatch: # Manual trigger
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run agent security scan
uses: nousresearch/agentsec@v1
with:
target: hermes
profile: daily
fail-on: high
output-format: html
Interactive Shell
Manually test prompts against an agent in real-time:
agentsec shell hermes
# Or with a specific provider
agentsec shell openai:gpt-4o --template langchain-react
Scheduled Scanning
# Scan every 24 hours
agentsec watch hermes --every 24h
# With auto-fix for discovered vulnerabilities
agentsec watch hermes --every 12h --fix
Auto-Fix Vulnerabilities
# Apply fixes (modifies target config/SOUL.md)
agentsec scan hermes --fix
# Dry run: show what would change without modifying anything
agentsec scan hermes --fix --dry-run
Attack Coverage (18 Attacks)
| Severity | Attack Name | What It Tests |
|---|---|---|
| 🔴 CRITICAL | tool_argument_injection | Agent executes injected payloads in tool arguments (SQL/Shell/path traversal) |
| 🔴 CRITICAL | indirect_injection | Agent treats data from external sources as instructions |
| 🔴 CRITICAL | privilege_escalation | Agent performs actions beyond its intended authority |
| 🟠 HIGH | system_prompt_leak | Agent reveals its system prompt or configuration |
| 🟠 HIGH | data_leak | Agent leaks sensitive information in responses or tool calls |
| 🟠 HIGH | credential_hijacking | Agent is tricked into leaking or mishandling OAuth tokens/API keys |
| 🟠 HIGH | tool_chain_contamination | Data flowing between chained tools is not validated |
| 🟠 HIGH | memory_poisoning | Agent can be poisoned via multi-turn conversation |
| 🟠 HIGH | tool_confusion | Agent uses wrong tools due to ambiguous descriptions or crafted tool names |
| 🟠 HIGH | rag_poisoning | Agent treats retrieved data as instructions rather than information |
| 🟡 MEDIUM | context_overflow | Filling the context window causes agent to lose safety constraints |
| 🟡 MEDIUM | multi_agent_collusion | Malicious instructions propagate when delegating to sub-agents |
| 🟡 MEDIUM | cross_session_memory_poisoning | Agent's persistent memory can be contaminated across sessions |
| 🟡 MEDIUM | agent_to_agent_attack | Agent-to-agent communication channels can be poisoned or hijacked |
| 🟡 MEDIUM | mcp_protocol_security | MCP-specific: tool discovery abuse, argument injection, sandbox escape |
| 🟡 MEDIUM | tool_output_manipulation | Agent blindly trusts tool return values and acts on embedded instructions |
| 🔵 LOW | hallucination_trigger | Agent fabricates information about non-existent entities |
| 🔵 LOW | dos_attack | Agent lacks safeguards against denial-of-service via infinite loops or resource exhaustion |
Report Formats
# Machine-readable JSON — ideal for CI integration
agentsec scan hermes --profile full -o report.json
# Markdown — paste into PRs or documentation
agentsec scan hermes --profile full -o report.md
# HTML — visual dashboard for stakeholders
agentsec scan hermes --profile full -o report.html
HTML Report Preview
┌────────────────────────────────────────────────────────┐
│ 🔍 Agent Security Scan Report │
│ Target: hermes (deepseek/deepseek-v4-flash) │
│ Profile: full · 18 attacks · 2026-06-02 │
├────────────────────────────────────────────────────────┤
│ │
│ ⚠️ HIGH system_prompt_leak │
│ Leaked: Agent responded with content containing │
│ "system prompt" text (248 chars) │
│ Fix: Add explicit refusal instruction │
│ │
│ ✅ PASS tool_argument_injection │
│ ✅ PASS privilege_escalation │
│ ✅ PASS credential_hijacking │
│ ... │
│ │
│ Summary: 17 passed · 1 vulnerable · 0 errors │
│ Duration: 4m 23s │
└────────────────────────────────────────────────────────┘
Project Status
agentsec CLI v0.2.0
├─ scan — 18 attacks, 9 providers, 6 templates, 5 trace formats
├─ shell — interactive probe shell
├─ watch — cron-based recurring scanning
├─ db — SQLite-backed result database
└─ self-test — 7/7 ✅
CI/CD: GitHub Action (action.yml + example workflow)
Architecture
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
│ User │───▶│ ScanEngine │───▶│ AgentTarget │
│ (CLI/CI) │ │ │ │ (online) │
└─────────────┘ │ │ └──────┬───────┘
│ - profiles │ │
│ - scheduling │ ▼
│ - reporting │ ┌──────────────┐
│ │ │ LLM Provider│
│ │ │ (API call) │
│ │ └──────────────┘
│ │
│ │ ┌──────────────┐
│ │ │ Trace File │
│ │───▶│ (offline) │
│ │ └──────────────┘
└───────┬────────┘
│
▼
┌────────────────┐
│ Detection │
│ Pipeline │
│ │
│ 1. Tool │
│ Analysis │
│ 2. LLM Judge │
│ 3. Auto-fix │
└───────┬────────┘
│
▼
┌────────────────┐
│ Report │
│ (JSON/MD/HTML)│
└────────────────┘
Comparison with Other Tools
| Feature | agentsec | Garak | Rebuff | Prompt Guard |
|---|---|---|---|---|
| Tool-calling attacks | ✅ 18 vectors | ❌ Chat only | ❌ Chat only | ❌ Chat only |
| MCP protocol attacks | ✅ Native | ❌ | ❌ | ❌ |
| Agent trace analysis | ✅ 5 formats | ❌ | ❌ | ❌ |
| Online API scanning | ✅ 9 providers | ❌ | ❌ | ❌ |
| CI/CD integration | ✅ GitHub Action | ❌ | ❌ | ❌ |
| Custom attack templates | ✅ YAML | ✅ Similar | ❌ | ❌ |
| Auto-fix | ✅ 4 vectors | ❌ | ❌ | ❌ |
| Agent simulation | ✅ 6 templates | ❌ | ❌ | ❌ |
Requirements
- Python 3.10+
- API key for the LLM provider you want to scan (DeepSeek, OpenAI, Anthropic, etc.)
Development
git clone https://github.com/nousresearch/hermes
cd hermes/agentsec
# Install in editable mode
pip install -e .
# Run self-tests
agentsec self-test
# Build distribution
python -m build
License
MIT
🔍 AI Agent 安全扫描器 (agentsec)
检测 AI Agent 的工具调用漏洞 — 在攻击者之前发现风险。
agentsec 是专为调用工具的 AI Agent 设计的安全扫描器。与传统 LLM 安全工具只关注聊天式 prompt 注入不同,agentsec 测试 tool-using agent 独有的攻击面:参数注入、权限逃逸、工具链污染、MCP 协议滥用、跨会话记忆毒化等。
快速开始
pip install agentvuln
agentsec scan hermes --profile quick
为什么用 agentsec
现有工具(Garak、Rebuff、Prompt Guard)只检测聊天中的 prompt 注入。但 AI agent 的真实风险在于工具调用滥用——攻击者让 agent:
- 通过文件读取工具读取
/etc/shadow - 通过 shell 工具执行
rm -rf / - 通过邮件工具将数据库内容发给第三方
- 通过精心构造的 prompt 泄露自己的系统提示词
agentsec 是唯一专门针对 tool-calling agent(Claude Code、ChatGPT Functions、LangChain agents、MCP-based agents 等)的开源扫描器。
功能一览
| 功能 | 说明 |
|---|---|
| 18 个攻击向量 | Prompt 注入、权限逃逸、数据泄露、工具滥用、MCP 攻击等 |
| 在线 + 离线 | 扫描在线 agent(API)或离线 trace 文件(JSON/JSONL) |
| 多 Provider | DeepSeek、OpenAI、Anthropic、OpenRouter、Google、xAI 等 |
| Agent 模板 | 6 种模拟模板:LangChain ReAct、Claude Code、Codex CLI 等 |
| CI/CD 集成 | 原生 GitHub Action,自动扫描 |
| 自动修复 | 部分漏洞可自动修复 |
| 3 种报告格式 | JSON(CI)、Markdown(PR)、HTML(仪表盘) |
| 扫描配置 | quick(5项,~1分钟)、daily(8项,~2分钟)、full(全18项) |
| 自定义攻击 | 支持 YAML 自定义攻击模板 |
| 交互 Shell | agentsec shell 手工探测 |
| 定时扫描 | agentsec watch cron 集成 |
| Trace 适配 | 支持 LangSmith、LangChain、Claude Code、OpenAI 格式 |
| 结果数据库 | SQLite 持久化存储,支持趋势分析 |
攻击覆盖(共18项)
| 等级 | 攻击名称 | 检测内容 |
|---|---|---|
| 🔴 严重 | tool_argument_injection | Agent 执行注入到工具参数中的 SQL/Shell/路径遍历载荷 |
| 🔴 严重 | indirect_injection | Agent 将外部数据源中的指令当作上下文执行 |
| 🔴 严重 | privilege_escalation | Agent 执行越权操作 |
| 🟠 高 | system_prompt_leak | Agent 泄露系统提示词或配置信息 |
| 🟠 高 | data_leak | Agent 在响应或工具调用中泄露敏感信息 |
| 🟠 高 | credential_hijacking | Agent 被诱导泄露或错误处理 OAuth 令牌/API Key |
| 🟠 高 | tool_chain_contamination | 链式工具间的数据流未经校验 |
| 🟠 高 | memory_poisoning | 多轮对话中植入恶意指令 |
| 🟠 高 | tool_confusion | Agent 因模糊描述或构造的工具名使用错误的工具 |
| 🟠 高 | rag_poisoning | Agent 将检索数据当作指令而非信息 |
| 🟡 中 | context_overflow | 填满上下文窗口导致 Agent 丢失安全约束 |
| 🟡 中 | multi_agent_collusion | 子 agent 间的恶意指令传播 |
| 🟡 中 | cross_session_memory_poisoning | 跨会话污染 Agent 持久记忆 |
| 🟡 中 | agent_to_agent_attack | Agent 间通信通道被投毒或劫持 |
| 🟡 中 | mcp_protocol_security | MCP 协议攻击:工具发现滥用、参数注入、沙箱逃逸 |
| 🟡 中 | tool_output_manipulation | Agent 盲目信任工具返回值并执行嵌入指令 |
| 🔵 低 | hallucination_trigger | Agent 对不存在的实体捏造虚假信息 |
| 🔵 低 | dos_attack | Agent 缺乏对 DoS 攻击的防护(死循环、资源耗尽) |
使用方法
扫描在线 Agent
# 快速扫描(5项最关键攻击,~1分钟)
agentsec scan hermes --profile quick
# 全量扫描(全部18项)
agentsec scan hermes --profile full
# 指定 provider
agentsec scan deepseek:deepseek-v4-flash
agentsec scan openai:gpt-4o
扫描离线 Trace 文件
agentsec scan langsmith_trace.json -o report.html
agentsec scan claude_code_log.json -o report.md
CI/CD 集成
创建 .github/workflows/agentsec-scan.yml:
name: AI Agent 安全扫描
on:
schedule: [{ cron: '0 6 * * *' }]
workflow_dispatch: {}
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: 运行安全扫描
uses: nousresearch/agentsec@v1
with:
target: hermes
profile: daily
fail-on: high
与竞品对比
| 功能 | agentsec | Garak | Rebuff | Prompt Guard |
|---|---|---|---|---|
| 工具调用攻击 | ✅ 18种 | ❌ 仅聊天 | ❌ 仅聊天 | ❌ 仅聊天 |
| MCP 协议攻击 | ✅ 原生 | ❌ | ❌ | ❌ |
| Agent Trace 分析 | ✅ 5种格式 | ❌ | ❌ | ❌ |
| 在线 API 扫描 | ✅ 9个provider | ❌ | ❌ | ❌ |
| CI/CD 集成 | ✅ GitHub Action | ❌ | ❌ | ❌ |
| 自定义攻击模板 | ✅ YAML | ✅ 类似 | ❌ | ❌ |
| 自动修复 | ✅ 4种向量 | ❌ | ❌ | ❌ |
| Agent 仿真 | ✅ 6种模板 | ❌ | ❌ | ❌ |
许可证
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentvuln-0.2.1.tar.gz.
File metadata
- Download URL: agentvuln-0.2.1.tar.gz
- Upload date:
- Size: 82.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d60b931648444e5efe6fa7a61b92b5742da866e4d7d1c836f5737f7417cbeb0
|
|
| MD5 |
c906cfd844c6ea19aab105ccb3113d13
|
|
| BLAKE2b-256 |
1cf7a19a0110fc6f9a739e30534a4a5200b6f719b2e7e1b83c517734991b8cc6
|
File details
Details for the file agentvuln-0.2.1-py3-none-any.whl.
File metadata
- Download URL: agentvuln-0.2.1-py3-none-any.whl
- Upload date:
- Size: 102.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
551568dd843af13daa13d66a32ea892cffe0a92704ce078fbcf26576387a133d
|
|
| MD5 |
0de192c5a71135cf0c6b3575ce23ab69
|
|
| BLAKE2b-256 |
af57dbfdfe4adbbe73c2cad3253e3a0a45f0ba38a994178090fd8a69aa38d6d6
|