AI Agent Security Scanner — detect tool-calling vulnerabilities in LLM agents

These details have not been verified by PyPI

Project links

Project description

Version 0.2.0 Python 3.10+ MIT License 18 attacks

🔍 Agent Security Scanner (agentsec)

Detect tool-calling vulnerabilities in AI agents — before attackers do.

English | 中文

agentsec is a security scanner purpose-built for AI agents that call tools. Unlike traditional LLM security scanners that focus on prompt injection in chat, agentsec tests the unique attack surface of tool-using agents: argument injection, privilege escalation, tool chain contamination, MCP protocol abuse, cross-session memory poisoning, and more.

⚠️ Alpha stage — works, tested, but APIs may change. Contributions welcome.

Why agentsec?

Existing tools (Garak, Rebuff, Prompt Guard) focus on prompt injection in chat. But the real risk for AI agents is tool-calling abuse — when an attacker makes your agent:

Read /etc/shadow via a file-read tool
Execute rm -rf / via a shell tool
Send your database contents to a third party via email tool
Leak its own system prompt via a crafted prompt

agentsec is the only open-source scanner that specifically targets tool-calling agents (Claude Code, ChatGPT with functions, LangChain agents, MCP-based agents, etc.).

Quick Start

# Install
pip install agentvuln

# Scan your local Hermes agent
agentsec scan hermes --profile quick

# Or scan an offline trace file
agentsec scan agent_trace.json -o report.html

Feature Overview

Feature	Description
18 attack vectors	Prompt injection, privilege escalation, data leaks, tool abuse, MCP attacks, and more
Online + Offline	Scan live agents (API) or offline trace files (JSON/JSONL)
Multi-provider	DeepSeek, OpenAI, Anthropic, OpenRouter, Google, xAI, and custom endpoints
Agent templates	6 simulation templates: LangChain ReAct, Claude Code, Codex CLI, OpenAI Functions, MCP Agent, Default
CI/CD ready	Native GitHub Action (`action.yml`) for automated scanning in pipelines
Auto-fix	Some vulnerabilities can be automatically mitigated
3 report formats	JSON (CI), Markdown (PRs), HTML (dashboards)
Scan profiles	`quick` (5 attacks, ~1 min), `daily` (8 attacks, ~2 min), `full` (all 18, ~4.5 min)
Custom attacks	Bring your own YAML attack templates
Interactive shell	Probe agents manually with `agentsec shell`
Watch mode	Schedule recurring scans via cron
Trace adapters	Import traces from LangSmith, LangChain, Claude Code, OpenAI format
Result database	SQLite-backed persistent storage for trend analysis

Usage

Scan a Live Agent

# Quick scan (5 most critical attacks)
agentsec scan hermes --profile quick

# Daily scan (8 common attacks)
agentsec scan hermes --profile daily

# Full scan (all 18 attacks)
agentsec scan hermes --profile full

Scan with a Specific Provider

# Direct API to any provider
agentsec scan deepseek:deepseek-v4-flash
agentsec scan openai:gpt-4o
agentsec scan openrouter:anthropic/claude-sonnet-4
agentsec scan google:gemini-2.0-flash
agentsec scan xai:grok-3

Scan with Agent Templates

Simulate different agent architectures without running the actual framework:

# Simulate a LangChain ReAct agent on top of GPT-4o
agentsec scan openai:gpt-4o --template langchain-react

# Simulate Claude Code agent behavior
agentsec scan deepseek:deepseek-v4-flash --template claude-code

# List all available templates
agentsec scan hermes --list-templates

Available templates: langchain-react, claude-code, codex-cli, openai-functions, mcp-agent, default.

Scan Offline Traces

# Auto-detect trace format
agentsec scan trace.json

# Supported formats: LangSmith, LangChain, Claude Code, OpenAI chat format
# agentsec auto-detects based on file extension and content signature

Run in CI/CD (GitHub Actions)

Create .github/workflows/agentsec-scan.yml:

name: AI Agent Security Scan
on:
  schedule:
    - cron: '0 6 * * *'   # Daily at 6 AM
  workflow_dispatch:       # Manual trigger

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run agent security scan
        uses: nousresearch/agentsec@v1
        with:
          target: hermes
          profile: daily
          fail-on: high
          output-format: html

Interactive Shell

Manually test prompts against an agent in real-time:

agentsec shell hermes

# Or with a specific provider
agentsec shell openai:gpt-4o --template langchain-react

Scheduled Scanning

# Scan every 24 hours
agentsec watch hermes --every 24h

# With auto-fix for discovered vulnerabilities
agentsec watch hermes --every 12h --fix

Auto-Fix Vulnerabilities

# Apply fixes (modifies target config/SOUL.md)
agentsec scan hermes --fix

# Dry run: show what would change without modifying anything
agentsec scan hermes --fix --dry-run

Attack Coverage (18 Attacks)

Severity	Attack Name	What It Tests
🔴 CRITICAL	tool_argument_injection	Agent executes injected payloads in tool arguments (SQL/Shell/path traversal)
🔴 CRITICAL	indirect_injection	Agent treats data from external sources as instructions
🔴 CRITICAL	privilege_escalation	Agent performs actions beyond its intended authority
🟠 HIGH	system_prompt_leak	Agent reveals its system prompt or configuration
🟠 HIGH	data_leak	Agent leaks sensitive information in responses or tool calls
🟠 HIGH	credential_hijacking	Agent is tricked into leaking or mishandling OAuth tokens/API keys
🟠 HIGH	tool_chain_contamination	Data flowing between chained tools is not validated
🟠 HIGH	memory_poisoning	Agent can be poisoned via multi-turn conversation
🟠 HIGH	tool_confusion	Agent uses wrong tools due to ambiguous descriptions or crafted tool names
🟠 HIGH	rag_poisoning	Agent treats retrieved data as instructions rather than information
🟡 MEDIUM	context_overflow	Filling the context window causes agent to lose safety constraints
🟡 MEDIUM	multi_agent_collusion	Malicious instructions propagate when delegating to sub-agents
🟡 MEDIUM	cross_session_memory_poisoning	Agent's persistent memory can be contaminated across sessions
🟡 MEDIUM	agent_to_agent_attack	Agent-to-agent communication channels can be poisoned or hijacked
🟡 MEDIUM	mcp_protocol_security	MCP-specific: tool discovery abuse, argument injection, sandbox escape
🟡 MEDIUM	tool_output_manipulation	Agent blindly trusts tool return values and acts on embedded instructions
🔵 LOW	hallucination_trigger	Agent fabricates information about non-existent entities
🔵 LOW	dos_attack	Agent lacks safeguards against denial-of-service via infinite loops or resource exhaustion

Report Formats

# Machine-readable JSON — ideal for CI integration
agentsec scan hermes --profile full -o report.json

# Markdown — paste into PRs or documentation
agentsec scan hermes --profile full -o report.md

# HTML — visual dashboard for stakeholders
agentsec scan hermes --profile full -o report.html

HTML Report Preview

┌────────────────────────────────────────────────────────┐
│  🔍 Agent Security Scan Report                        │
│  Target: hermes (deepseek/deepseek-v4-flash)          │
│  Profile: full · 18 attacks · 2026-06-02              │
├────────────────────────────────────────────────────────┤
│                                                        │
│  ⚠️  HIGH   system_prompt_leak                        │
│       Leaked: Agent responded with content containing  │
│       "system prompt" text (248 chars)                 │
│       Fix: Add explicit refusal instruction            │
│                                                        │
│  ✅ PASS  tool_argument_injection                      │
│  ✅ PASS  privilege_escalation                         │
│  ✅ PASS  credential_hijacking                         │
│  ...                                                   │
│                                                        │
│  Summary: 17 passed · 1 vulnerable · 0 errors          │
│  Duration: 4m 23s                                      │
└────────────────────────────────────────────────────────┘

Project Status

agentsec CLI v0.2.0
├─ scan    — 18 attacks, 9 providers, 6 templates, 5 trace formats
├─ shell   — interactive probe shell
├─ watch   — cron-based recurring scanning
├─ db      — SQLite-backed result database
└─ self-test — 7/7 ✅

CI/CD: GitHub Action (action.yml + example workflow)

Architecture

┌─────────────┐    ┌────────────────┐    ┌──────────────┐
│  User       │───▶│  ScanEngine    │───▶│  AgentTarget │
│  (CLI/CI)   │    │                │    │  (online)    │
└─────────────┘    │                │    └──────┬───────┘
                   │  - profiles    │           │
                   │  - scheduling  │           ▼
                   │  - reporting   │    ┌──────────────┐
                   │                │    │  LLM Provider│
                   │                │    │  (API call)  │
                   │                │    └──────────────┘
                   │                │
                   │                │    ┌──────────────┐
                   │                │    │  Trace File  │
                   │                │───▶│  (offline)   │
                   │                │    └──────────────┘
                   └───────┬────────┘
                           │
                           ▼
                   ┌────────────────┐
                   │  Detection     │
                   │  Pipeline      │
                   │                │
                   │  1. Tool       │
                   │     Analysis   │
                   │  2. LLM Judge  │
                   │  3. Auto-fix   │
                   └───────┬────────┘
                           │
                           ▼
                   ┌────────────────┐
                   │  Report        │
                   │  (JSON/MD/HTML)│
                   └────────────────┘

Comparison with Other Tools

Feature	agentsec	Garak	Rebuff	Prompt Guard
Tool-calling attacks	✅ 18 vectors	❌ Chat only	❌ Chat only	❌ Chat only
MCP protocol attacks	✅ Native	❌	❌	❌
Agent trace analysis	✅ 5 formats	❌	❌	❌
Online API scanning	✅ 9 providers	❌	❌	❌
CI/CD integration	✅ GitHub Action	❌	❌	❌
Custom attack templates	✅ YAML	✅ Similar	❌	❌
Auto-fix	✅ 4 vectors	❌	❌	❌
Agent simulation	✅ 6 templates	❌	❌	❌

Requirements

Python 3.10+
API key for the LLM provider you want to scan (DeepSeek, OpenAI, Anthropic, etc.)

Development

git clone https://github.com/nousresearch/hermes
cd hermes/agentsec

# Install in editable mode
pip install -e .

# Run self-tests
agentsec self-test

# Build distribution
python -m build

License

MIT

🔍 AI Agent 安全扫描器 (agentsec)

检测 AI Agent 的工具调用漏洞 — 在攻击者之前发现风险。

agentsec 是专为调用工具的 AI Agent 设计的安全扫描器。与传统 LLM 安全工具只关注聊天式 prompt 注入不同，agentsec 测试 tool-using agent 独有的攻击面：参数注入、权限逃逸、工具链污染、MCP 协议滥用、跨会话记忆毒化等。

快速开始

pip install agentvuln
agentsec scan hermes --profile quick

为什么用 agentsec

现有工具（Garak、Rebuff、Prompt Guard）只检测聊天中的 prompt 注入。但 AI agent 的真实风险在于工具调用滥用——攻击者让 agent：

通过文件读取工具读取 /etc/shadow
通过 shell 工具执行 rm -rf /
通过邮件工具将数据库内容发给第三方
通过精心构造的 prompt 泄露自己的系统提示词

agentsec 是唯一专门针对 tool-calling agent（Claude Code、ChatGPT Functions、LangChain agents、MCP-based agents 等）的开源扫描器。

功能一览

功能	说明
18 个攻击向量	Prompt 注入、权限逃逸、数据泄露、工具滥用、MCP 攻击等
在线 + 离线	扫描在线 agent（API）或离线 trace 文件（JSON/JSONL）
多 Provider	DeepSeek、OpenAI、Anthropic、OpenRouter、Google、xAI 等
Agent 模板	6 种模拟模板：LangChain ReAct、Claude Code、Codex CLI 等
CI/CD 集成	原生 GitHub Action，自动扫描
自动修复	部分漏洞可自动修复
3 种报告格式	JSON（CI）、Markdown（PR）、HTML（仪表盘）
扫描配置	`quick`（5项，~1分钟）、`daily`（8项，~2分钟）、`full`（全18项）
自定义攻击	支持 YAML 自定义攻击模板
交互 Shell	`agentsec shell` 手工探测
定时扫描	`agentsec watch` cron 集成
Trace 适配	支持 LangSmith、LangChain、Claude Code、OpenAI 格式
结果数据库	SQLite 持久化存储，支持趋势分析

攻击覆盖（共18项）

等级	攻击名称	检测内容
🔴 严重	tool_argument_injection	Agent 执行注入到工具参数中的 SQL/Shell/路径遍历载荷
🔴 严重	indirect_injection	Agent 将外部数据源中的指令当作上下文执行
🔴 严重	privilege_escalation	Agent 执行越权操作
🟠 高	system_prompt_leak	Agent 泄露系统提示词或配置信息
🟠 高	data_leak	Agent 在响应或工具调用中泄露敏感信息
🟠 高	credential_hijacking	Agent 被诱导泄露或错误处理 OAuth 令牌/API Key
🟠 高	tool_chain_contamination	链式工具间的数据流未经校验
🟠 高	memory_poisoning	多轮对话中植入恶意指令
🟠 高	tool_confusion	Agent 因模糊描述或构造的工具名使用错误的工具
🟠 高	rag_poisoning	Agent 将检索数据当作指令而非信息
🟡 中	context_overflow	填满上下文窗口导致 Agent 丢失安全约束
🟡 中	multi_agent_collusion	子 agent 间的恶意指令传播
🟡 中	cross_session_memory_poisoning	跨会话污染 Agent 持久记忆
🟡 中	agent_to_agent_attack	Agent 间通信通道被投毒或劫持
🟡 中	mcp_protocol_security	MCP 协议攻击：工具发现滥用、参数注入、沙箱逃逸
🟡 中	tool_output_manipulation	Agent 盲目信任工具返回值并执行嵌入指令
🔵 低	hallucination_trigger	Agent 对不存在的实体捏造虚假信息
🔵 低	dos_attack	Agent 缺乏对 DoS 攻击的防护（死循环、资源耗尽）

使用方法

扫描在线 Agent

# 快速扫描（5项最关键攻击，~1分钟）
agentsec scan hermes --profile quick

# 全量扫描（全部18项）
agentsec scan hermes --profile full

# 指定 provider
agentsec scan deepseek:deepseek-v4-flash
agentsec scan openai:gpt-4o

扫描离线 Trace 文件

agentsec scan langsmith_trace.json -o report.html
agentsec scan claude_code_log.json -o report.md

CI/CD 集成

创建 .github/workflows/agentsec-scan.yml：

name: AI Agent 安全扫描
on:
  schedule: [{ cron: '0 6 * * *' }]
  workflow_dispatch: {}

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: 运行安全扫描
        uses: nousresearch/agentsec@v1
        with:
          target: hermes
          profile: daily
          fail-on: high

与竞品对比

功能	agentsec	Garak	Rebuff	Prompt Guard
工具调用攻击	✅ 18种	❌ 仅聊天	❌ 仅聊天	❌ 仅聊天
MCP 协议攻击	✅ 原生	❌	❌	❌
Agent Trace 分析	✅ 5种格式	❌	❌	❌
在线 API 扫描	✅ 9个provider	❌	❌	❌
CI/CD 集成	✅ GitHub Action	❌	❌	❌
自定义攻击模板	✅ YAML	✅ 类似	❌	❌
自动修复	✅ 4种向量	❌	❌	❌
Agent 仿真	✅ 6种模板	❌	❌	❌

许可证

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Jun 6, 2026

0.2.2

Jun 4, 2026

This version

0.2.1

Jun 3, 2026

0.2.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentvuln-0.2.1.tar.gz (82.9 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentvuln-0.2.1-py3-none-any.whl (102.5 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file agentvuln-0.2.1.tar.gz.

File metadata

Download URL: agentvuln-0.2.1.tar.gz
Upload date: Jun 3, 2026
Size: 82.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentvuln-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`2d60b931648444e5efe6fa7a61b92b5742da866e4d7d1c836f5737f7417cbeb0`
MD5	`c906cfd844c6ea19aab105ccb3113d13`
BLAKE2b-256	`1cf7a19a0110fc6f9a739e30534a4a5200b6f719b2e7e1b83c517734991b8cc6`

See more details on using hashes here.

File details

Details for the file agentvuln-0.2.1-py3-none-any.whl.

File metadata

Download URL: agentvuln-0.2.1-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 102.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentvuln-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`551568dd843af13daa13d66a32ea892cffe0a92704ce078fbcf26576387a133d`
MD5	`0de192c5a71135cf0c6b3575ce23ab69`
BLAKE2b-256	`af57dbfdfe4adbbe73c2cad3253e3a0a45f0ba38a994178090fd8a69aa38d6d6`

See more details on using hashes here.

agentvuln 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 Agent Security Scanner (agentsec)

Why agentsec?

Quick Start

Feature Overview

Usage

Scan a Live Agent

Scan with a Specific Provider

Scan with Agent Templates

Scan Offline Traces

Run in CI/CD (GitHub Actions)

Interactive Shell

Scheduled Scanning

Auto-Fix Vulnerabilities

Attack Coverage (18 Attacks)

Report Formats

HTML Report Preview

Project Status

Architecture

Comparison with Other Tools

Requirements

Development

License

🔍 AI Agent 安全扫描器 (agentsec)

快速开始

为什么用 agentsec

功能一览

攻击覆盖（共18项）

使用方法

扫描在线 Agent

扫描离线 Trace 文件

CI/CD 集成

与竞品对比

许可证

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes