Skip to main content

Open-source red teaming toolkit for AI agents, RAG systems, MCP servers, and tool-using LLM applications.

Project description

AgentGuardian

Open-source red-team testing toolkit for agentic AI systems.

96 attack probes · 11 attacker agents · OWASP ASI 2026 (all 10), 11+ MITRE ATLAS v5.4.0 techniques (see coverage matrix for the exact set; ~85% of techniques are out of scope for a black-box agent scanner) and CSA Agentic-RT (all 12) mappings · SARIF + PDF reports · runs offline.

PyPI Python License CI OpenSSF Scorecard Docs


What it is

AgentGuardian is a testing toolkit that runs adversarial probes against your agent — LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, ADK, Strands, or any HTTP endpoint — and produces a signed-style evidence bundle you can hand to security review.

It ships 96 attack probes organised against three public taxonomies:

It is deterministic in stub mode (no LLM key required), reproducible by seed, and emits SARIF + PDF + HTML + JSON.


Install

pip install agent-guardian

Requires Python 3.10+. Apache-2.0 licensed.


How it compares

Tool Multi-agent swarm Agentic-AI focus Standards alignment License
PyRIT no no PyRIT risk taxonomy MIT
garak no no own taxonomy Apache-2.0
Promptfoo no no OWASP LLM Top 10 + ATLAS + EU AI Act MIT
Inspect no no own taxonomy MIT
DeepTeam no no OWASP LLM Top 10 Apache-2.0
AgentGuardian yes yes OWASP ASI 2026 + MITRE ATLAS v5.4.0 + CSA Apache-2.0

Coverage by OWASP ASI 2026 category

All 96 attack probes are distributed across the ten OWASP ASI 2026 categories below. Each finding is triple-tagged with its ASI, MITRE ATLAS, and CSA Agentic-RT identifiers.

  • ASI01 — Memory Poisoning
  • ASI02 — Tool Misuse
  • ASI03 — Privilege Compromise
  • ASI04 — Supply Chain
  • ASI05 — Code Execution
  • ASI06 — Intent Breaking & Goal Manipulation
  • ASI07 — Agent-to-Agent Compromise
  • ASI08 — Cascading Failures
  • ASI09 — Trust Exploitation
  • ASI10 — Rogue Agents (drift)

Enumerate locally with agent-guardian list-probes; full catalogue at agentguardian.io/attacks/overview.


60-second quickstart

# 1. Sanity check the install (no API key, no network)
agent-guardian doctor

# 2. List the 96 shipped probes
agent-guardian list-probes

# 3. Run an offline scan against the built-in stub target
agent-guardian scan --target stub --mode fast --llm stub

# 4. Open the HTML report
open reports/latest/report.html

Stub mode requires no LLM API key, no network, no environment variables — it uses canned deterministic responses so you can verify the toolchain end-to-end before pointing it at a real target.


Scan a real agent

# Against an HTTP endpoint (any framework, any language)
agent-guardian scan \
  --target http://localhost:8000/chat \
  --framework http \
  --mode smart \
  --llm openai \
  --fail-under 80

# Against a LangGraph app
agent-guardian scan \
  --target ./my_graph.py:graph \
  --framework langgraph \
  --mode full

Exit code is non-zero if the posture score falls below --fail-under, so the same command works inside CI.


Scan modes

Mode What it runs Typical wall time
fast High-signal probe subset, single attacker per family ~2 min
smart Curated coverage with adaptive attacker selection ~10 min
full Every probe, every applicable attacker, full mutation set 30+ min

Default mode is full. Pick fast for pre-commit / PR checks, smart for nightly runs.


Framework adapters

Shipped first-class adapters (pluggable via --framework):

  • langgraph — LangGraph state graphs
  • crewai — CrewAI crews
  • openai-agents — OpenAI Agents SDK
  • autogen — Microsoft AutoGen
  • adk — Google ADK
  • strands — AWS Strands
  • http — any HTTP/JSON endpoint (works for FastAPI, Flask, Express, anything)

MCP servers and RAG pipelines are covered via the http adapter and worked examples under examples/ (examples/mcp_server, examples/rag_app, examples/fastapi_chatbot).


Attacker swarm

The core swarm contains 11 attacker agents, each scoped to a distinct family of agent-stack failure modes:

Agent Targets
recon-agent Surface mapping, tool discovery
goal-hijack-agent Goal redirection, system-prompt override
tool-abuse-agent Tool misuse, argument injection
privilege-agent Privilege escalation, role confusion
supply-chain-agent Tool/model/data supply-chain attacks
code-exec-agent Sandbox escape, code execution
memory-poison-agent Long-term memory poisoning
a2a-agent Agent-to-agent trust exploits
cascade-agent Cascading hallucination / cross-agent contagion
trust-exploit-agent Operator/system trust boundary abuse
drift-agent Behavioural drift, policy erosion over conversation

Additional specialist classes (FuzzingAgent, OutputHandlingAgent, DenialOfWalletAgent, DetectionEvasionAgent, SecretExtractionAgent, IdentityLeakAgent, CriticAgent) ship as building blocks for custom swarms and are documented under agentguardian.io/attackers.


Reports & evidence

Every scan produces a timestamped bundle under reports/<run-id>/:

  • report.html — interactive dashboard, drillable per-probe
  • report.pdf — print-ready evidence (ReportLab)
  • report.sarif — SARIF 2.1.0 for GitHub Code Scanning / Defender / Snyk ingest
  • report.json — full machine-readable record
  • evidence/ — per-probe transcripts, prompts, responses, and verdicts

A sample HTML report lives at docs/_assets/sample-report.html.

Signing: the sign_evidence config flag is wired but Sigstore-backed signing of the evidence bundle is planned, not shipped in 1.0.0. Until then the bundle ships unsigned; the SARIF / JSON / PDF are deterministic and hash-stable for external signing.


Local dashboard

agent-guardian serve
# → http://localhost:7474

Browse historical runs, diff posture scores across releases, and download evidence bundles.


CI integration

# .github/workflows/agent-guardian.yml
- uses: actions/setup-python@v5
  with: { python-version: '3.11' }
- run: pip install agent-guardian
- run: agent-guardian scan --target ./agent.py:app --mode smart --fail-under 80 --output sarif
- uses: github/codeql-action/upload-sarif@v3
  with: { sarif_file: reports/latest/report.sarif }

Worked examples under examples/ci/.


Privacy & telemetry

No telemetry is collected. TelemetryConfig.enabled defaults to False. There is no phone-home, no analytics ping, no install tracker. Stub mode additionally requires no network access at all.


Standards mappings

Each probe is tagged with its ASI category, ATLAS technique, and CSA category. Run:

agent-guardian list-probes --by-standard owasp-asi
agent-guardian list-probes --by-standard mitre-atlas
agent-guardian list-probes --by-standard csa-agentic-rt

The honest, auto-generated coverage table lives at docs/reference/framework-coverage-matrix.md — it lists every ATLAS technique the shipped corpus actually cites, and marks zero-coverage CSA categories explicitly rather than hiding them. Full mapping tables also at agentguardian.io/standards.


Docs


Project status

AgentGuardian 1.0.0 is the first stable release. Semver applies to: the public Python API, the CLI surface, the SARIF / JSON report schemas, and the probe IDs. Probe content (prompts, scoring) may evolve within a minor version.

See ROADMAP.md for what is next, CHANGELOG.md for what shipped, and governance.md for how decisions are made.


Contributing

We welcome new probes, new adapters, and new attacker classes. Start with CONTRIBUTING.md and the good first issue label.

All commits must be DCO-signed (git commit -s). The pre-commit hook will block unsigned commits.

By participating you agree to the Code of Conduct and the Ethics Policy — AgentGuardian is for testing systems you own or are authorised to test.


Security

To report a vulnerability, see SECURITY.md. Please do not open public issues for security reports.


License

Apache-2.0. See LICENSE and NOTICE.

AgentGuardian is a trademark of Glacien Technologies — see TRADEMARKS.md for usage guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_guardian-1.0.0rc3.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_guardian-1.0.0rc3-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file agent_guardian-1.0.0rc3.tar.gz.

File metadata

  • Download URL: agent_guardian-1.0.0rc3.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_guardian-1.0.0rc3.tar.gz
Algorithm Hash digest
SHA256 70029b82237f8aaf4da4a0a082af39b79328af25f5fa4a0c0b785398fddb7c27
MD5 aa643905c6f2605ff767cb65c86b847c
BLAKE2b-256 f8ed91915e53e2a520a0ee221804717cfdc58008c89720450fd582122112dcc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_guardian-1.0.0rc3.tar.gz:

Publisher: publish.yml on glacien-technologies/agent-guardian

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_guardian-1.0.0rc3-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_guardian-1.0.0rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 2e0fd4aeb2b8db161f7c0ebe781857e421283c13c8493e2757078c4fdd61b32e
MD5 3820fa31f7e5503c088ee247f26acd21
BLAKE2b-256 1b80c9525e0f73e5b7e77042d382b8ecdb5e3a15346fc357c9d4e721e52f5b1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_guardian-1.0.0rc3-py3-none-any.whl:

Publisher: publish.yml on glacien-technologies/agent-guardian

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page