Open-source red teaming toolkit for AI agents, RAG systems, MCP servers, and tool-using LLM applications.
Project description
AgentGuardian
Red-team your AI agents before attackers do.
Open-source, local-first adversarial security testing for AI agents, RAG systems, MCP servers, and tool-using LLM applications.
Docs · Quickstart · Try the demo agent · Attack library · CI/CD · Sample report
AgentGuardian points a swarm of adversarial probes at your target and gives you reproducible evidence you can use in engineering, security review, and CI: AIVSS scoring, signed JSON, SARIF, Markdown, JUnit, PDF, and per-probe transcripts.
- Built for agentic systems, not just single-prompt chatbot evals.
- Finds prompt injection, tool misuse, privilege abuse, memory poisoning, code-exec paths, trust exploits, and goal drift.
- Runs locally, in CI, or offline in deterministic
--model stubmode.
Demo
pip install agent-guardian
echo 'You are a helpful customer-support agent for ACME Bank.' > prompt.txt
agent-guardian scan --system-prompt prompt.txt --mode fast --model stub
That gives you:
- a live local dashboard during interactive scans
- a stored scan artifact under
~/.agentguardian/scans/<scan_id>/ - exportable reports via
agent-guardian report SCAN_ID --output sarif --output-path scan.sarif - a static reference rendering:
docs/_assets/sample-report.html
--model stub requires no API key and no network. Swap in a real model such as gemini:gemini-2.5-flash or openai:gpt-4o when you want an authoritative judge.
Install
# pip
pip install agent-guardian
# pipx
pipx install agent-guardian
# uv
uv add agent-guardian
# or
uv tool install agent-guardian
Python 3.11–3.13 are supported. Linux and macOS are first-class; Windows is community-supported.
Heads up: current macOS often defaults
python3to3.14, which AgentGuardian does not yet target. Ifpip install agent-guardianfails withNo matching distribution found, use Python3.13instead:python3.13 -m venv .venv source .venv/bin/activate pip install agent-guardianDocker and the GitHub Action path are insulated from this. See
docs/installation.mdxfor the full install matrix.
60-second quickstart
# 1. Sanity-check the install
agent-guardian doctor
# 2. See the shipped probe corpus
agent-guardian list-probes
# 3. Run an offline scan
echo 'You are a helpful customer-support agent for ACME Bank.' > prompt.txt
agent-guardian scan --system-prompt prompt.txt --mode fast --model stub
# 4. Export a machine-readable report once you have a scan id
agent-guardian report SCAN_ID --output sarif --output-path scan.sarif
Interactive scans auto-serve a local dashboard. You can also browse results later with:
agent-guardian serve
# → http://127.0.0.1:7474
What you can scan
- Prompt-only targets via
--system-prompt PATH - Hosted HTTP agents via
--endpoint URL - Framework-native objects via
--framework KIND --framework-ref MODULE:ATTR - Custom Python entrypoints via the positional
targetargument (my_agent:run,path/to/app.py:graph) - Advanced contract-driven targets including MCP / OpenAPI / browser / WebSocket flows via the contract path documented in
docs/concepts/target-adapters.mdx
Built-in framework kinds:
langgraphcrewaiopenai_agentsautogenadkstrands
What it catches
AgentGuardian ships 96 attack probes covering all ten OWASP Top 10 for Agentic Applications 2026 categories:
- ASI01 — prompt injection / goal hijack
- ASI02 — tool misuse
- ASI03 — privilege compromise
- ASI04 — supply chain / resource overload
- ASI05 — code execution
- ASI06 — memory poisoning
- ASI07 — agent-to-agent compromise
- ASI08 — cascading failures
- ASI09 — trust exploitation / unsafe output handling
- ASI10 — untraceability / goal drift
The corpus is also mapped to MITRE ATLAS v5.4.0 and the CSA Agentic AI Red Teaming Guide. See the exact set in docs/reference/framework-coverage-matrix.md.
What you get
- Signed JSON evidence —
scan.jsonships with HMAC-SHA256 + Ed25519 signatures verifiable withagent-guardian verify - Exportable reports —
json,sarif,junit,md,gitlab, andpdf - Per-probe transcripts — prompts, responses, verdicts, and evidence trails
- AIVSS scoring — publishable in
--mode full; trend-tracking infastandsmart - Local dashboard — browse historical scans, findings, and evidence bundles
To verify a stored report:
agent-guardian verify ~/.agentguardian/scans/SCAN_ID/scan.json
Why AgentGuardian
- Agent-first — built for tool-using, stateful, multi-step systems rather than single-turn prompt checks
- Recon before attack — fingerprints the target surface and then runs only the relevant specialists
- Evidence over vibes — reports are grounded in transcripts, structured findings, and signed artifacts
- Local-first — no telemetry, no phone-home, and a fully offline stub-mode path
- CI-ready — non-zero exit codes, SARIF export, and reusable GitHub Action patterns
For a deeper competitive breakdown, see docs/concepts/agent-guardian-vs.mdx.
How it works
Every scan follows the same narrative:
- Plan — resolve the target type, budgets, models, and output format
- Recon — black-box fingerprint the target: tools, memory, PII exposure, multi-agent handoffs, reachable systems
- Red Teaming — dispatch ASI-aligned specialists against the observed surface
- Findings — judge outcomes, compute AIVSS, and write signed reports
The recon fingerprint is the key difference: AgentGuardian decides which attacks matter for this target before it spends budget on them.
Scan a real target
# Hosted HTTP endpoint
agent-guardian scan \
--endpoint http://localhost:8000/chat \
--model gemini:gemini-2.5-flash \
--mode smart
# LangGraph app
agent-guardian scan \
--framework langgraph \
--framework-ref my_app.graph:graph \
--model gemini:gemini-2.5-flash
# Custom Python entrypoint
agent-guardian scan \
my_agent:run \
--model gemini:gemini-2.5-flash
Worked examples live under examples/ and the Try AgentGuardian guides under docs/try/.
Scan modes
| Mode | Typical use | Notes |
|---|---|---|
fast |
Dev loop, smoke checks, pre-push | Lowest cost and quickest feedback |
smart |
PR iteration, broader nightly coverage | Better signal than fast, still non-authoritative |
full |
Release gates, CI on main, audit evidence |
Authoritative mode for AIVSS and --fail-under |
Only --mode full produces an authoritative AIVSS suitable for hard release gating.
CI integration
name: AgentGuardian
on: [pull_request]
jobs:
scan:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install agent-guardian
- name: Red-team the agent
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: |
agent-guardian scan \
--endpoint http://localhost:8000/chat \
--model gemini:gemini-2.5-flash \
--mode full \
--output sarif \
--output-path agentguardian.sarif \
--fail-under 80
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: agentguardian.sarif
See docs/ci-cd/overview.mdx and docs/ci-cd/github-actions.mdx for the fuller setup, thresholds, and composite-action path.
Standards and coverage
Enumerate the corpus locally:
agent-guardian list-probes
agent-guardian list-probes --by-standard owasp-asi
agent-guardian list-probes --by-standard mitre-atlas
agent-guardian list-probes --by-standard csa-agentic-rt
Coverage today:
- OWASP ASI 2026 — all 10 categories covered
- MITRE ATLAS v5.4.0 — mapped where black-box agent testing can observe the technique at the target I/O surface
- CSA Agentic AI Red Teaming Guide — mapped across the shipped corpus
The exact probe-to-standard mapping lives in docs/reports/owasp-mapping.mdx and docs/reference/framework-coverage-matrix.md.
Privacy & telemetry
No telemetry is collected. There is no analytics ping, install tracker, or phone-home path. Stub mode additionally works offline with no LLM key.
Docs
- Docs home
docs/quickstart.mdxdocs/attacks/overview.mdxdocs/concepts/target-adapters.mdxdocs/reference/cli.mdx
Project status
AgentGuardian 1.0.0 is the first stable release. Semantic versioning applies to the public Python API, CLI surface, report schemas, and probe IDs. Probe content and scoring may evolve within a minor release as coverage improves.
See ROADMAP.md, CHANGELOG.md, and governance.md.
Contributing
We welcome new probes, new adapters, and new attacker logic. Start with CONTRIBUTING.md and the good first issue label.
All commits must be DCO-signed:
git commit -s
By participating you agree to CODE_OF_CONDUCT.md and ETHICS.md. AgentGuardian is for testing systems you own or are explicitly authorised to test.
Community
Join us on Discord for probe design, adapter questions, and roadmap discussion. For longer-form support channels, see docs/community/support.mdx.
Security
To report a vulnerability, see SECURITY.md. Do not open public issues for security reports.
License
Apache-2.0. See LICENSE and NOTICE.
AgentGuardian is a trademark of Glacien Technologies. See TRADEMARKS.md for usage guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_guardian-1.0.0rc8.tar.gz.
File metadata
- Download URL: agent_guardian-1.0.0rc8.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08a520ed25776fcb3493dcfc8358e55b69f72f5ccca50b00406ca72ad247dd01
|
|
| MD5 |
317647a2fb5ab7c3de9852d8b0a46aac
|
|
| BLAKE2b-256 |
ce5e200dcdc95d48752886f6c6f1130482fa1bf03a32a90bc03c0cac982747b9
|
Provenance
The following attestation bundles were made for agent_guardian-1.0.0rc8.tar.gz:
Publisher:
publish.yml on glacien-technologies/agent-guardian
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_guardian-1.0.0rc8.tar.gz -
Subject digest:
08a520ed25776fcb3493dcfc8358e55b69f72f5ccca50b00406ca72ad247dd01 - Sigstore transparency entry: 1733546542
- Sigstore integration time:
-
Permalink:
glacien-technologies/agent-guardian@8d3f6de57dbf7184f6db357429c6404053b358dc -
Branch / Tag:
refs/tags/v1.0.0rc8 - Owner: https://github.com/glacien-technologies
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8d3f6de57dbf7184f6db357429c6404053b358dc -
Trigger Event:
push
-
Statement type:
File details
Details for the file agent_guardian-1.0.0rc8-py3-none-any.whl.
File metadata
- Download URL: agent_guardian-1.0.0rc8-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eac3c5dfd43fb422003f89e3fe0f28f7fbb49acef505b0e4749d815ae967d930
|
|
| MD5 |
5d59b9d14dc2b70d3f703c4cada7afa3
|
|
| BLAKE2b-256 |
8e0cee42703c3b254a14e0d8789641f7839a99a3cc3e3377368d7a2444ede3b5
|
Provenance
The following attestation bundles were made for agent_guardian-1.0.0rc8-py3-none-any.whl:
Publisher:
publish.yml on glacien-technologies/agent-guardian
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_guardian-1.0.0rc8-py3-none-any.whl -
Subject digest:
eac3c5dfd43fb422003f89e3fe0f28f7fbb49acef505b0e4749d815ae967d930 - Sigstore transparency entry: 1733546646
- Sigstore integration time:
-
Permalink:
glacien-technologies/agent-guardian@8d3f6de57dbf7184f6db357429c6404053b358dc -
Branch / Tag:
refs/tags/v1.0.0rc8 - Owner: https://github.com/glacien-technologies
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8d3f6de57dbf7184f6db357429c6404053b358dc -
Trigger Event:
push
-
Statement type: