Skip to main content

Per-action AI agent risk scoring and governance: 5-dimension scoring, HITL enforcement, Markov drift detection, injection scanning, and auditable decision logs.

Project description

fivedrisk — AI Agent Risk Governance Engine

License: Apache 2.0 Python 3.10+ Tests

Per-action risk scoring and governance for AI agents. Drop one decorator on your tool functions, or wire in the Agent SDK hooks — every action is scored, gated, and logged before it runs.

from fivedrisk.hooks import gate

@gate(tool_name="write_to_database", autonomy_context=2)
async def write_record(table: str, data: dict) -> None:
    ...  # only executes if 5D scores GREEN or YELLOW
         # ORANGE → human approval required
         # RED    → blocked, never runs

What it does

fivedrisk scores every AI agent action on 5 risk dimensions (0–4 each):

Dimension What it measures
D — Data Sensitivity Public → PII → financial → credentials
T — Tool Privilege Read-only → write → admin → destructive
R — Reversibility Undoable → hard-to-undo → irreversible
E — External Impact Local → internal API → external → untrusted
A — Autonomy Context User-direct → agent-supervised → fully autonomous

5D stays deterministic. Bands signal what fivedrisk wants your stack to do; your stack chooses the LLM and the workflow.

Default 3-band:

  • GREEN — execute, normal logging.
  • ORANGE — HITL approval required. fivedrisk signals; your stack handles the LLM choice. No auto model promotion.
  • RED — blocked.

Opt-in 4-band compliance mode (enable_yellow_band: true in policy.yaml): surfaces YELLOW as a stable moderate-risk tier for audit queries and dashboards. Optional model escalation within YELLOW via yellow_model_escalation: true.

fivedrisk is one layer in a defence-in-depth AI governance stack.


Features

  • 5D scoring engine — deterministic, ~40µs per action (p50) on M1, no LLM calls
  • Markov SafetyDrift — 16-state Markov chain detects cumulative risk across action sequences; catches compositional attacks that individual scoring misses
  • Session accumulator — O(1) counter-based drift tracking for the common case
  • Injection scanner — 24+ regex patterns covering GPT-5/Opus-era evasion (Base64, zero-width Unicode, role hijacks, encoded exec calls)
  • Output leakage scanner — PII, credentials, crypto keys, injection-echo detection
  • @gate decorator — wrap any sync or async function with full 5D gating
  • Agent SDK hooksfivedrisk_pre_tool / fivedrisk_post_tool for Anthropic Agent SDK
  • LangGraph node — drop-in integration for LangGraph pipelines
  • Destination policy — allowlist/denylist for outbound endpoints
  • Audit log — append-only SQLite decision log plus optional NDJSON event stream for SIEM delivery
  • Policy floor enforcement — floor rules in policy.yaml cannot be overridden at runtime
  • Defence-in-depth test suite — pytest markers per OWASP LLM Top 10 category, plus 39-scenario attack-class benchmark (python -m fivedrisk benchmark)
  • 424 tests with 0 failures

Install

pip install fivedrisk

For LangGraph integration:

pip install "fivedrisk[langgraph]"

Quick start (30 seconds)

from fivedrisk import classify_tool_call, score, load_policy, Band

policy = load_policy("policy.yaml")  # or use defaults
action = classify_tool_call("Bash", {"command": "rm -rf /tmp/cache"}, policy)
result = score(action, policy)

print(result.band)                      # Band.ORANGE
print(result.rationale)  # "ORANGE — Bash: Reversibility=3 (≥ ORANGE threshold 3)"
print(result.routing)    # RoutingDecision(model_floor=M3, approval_required=True)

With the @gate decorator:

from fivedrisk.hooks import gate, configure
from fivedrisk import load_policy

configure(policy=load_policy("policy.yaml"))

@gate(tool_name="send_email", autonomy_context=1)
def send_email(to: str, body: str) -> None:
    # only executes if 5D scores GREEN or YELLOW
    smtp.send(to, body)

With Anthropic Agent SDK:

from fivedrisk.hooks import fivedrisk_pre_tool, fivedrisk_post_tool

# Register as PreToolUse and PostToolUse hooks in your agent

With LangGraph:

from fivedrisk.langgraph_node import fivedrisk_gate_node
# Add fivedrisk_gate_node to your StateGraph before any tool-executing node

SafetyDrift — why sequence risk matters

A single READ of a config file scores GREEN. But 10 GREENs followed by a write to an external API using credentials extracted two steps earlier is a RED sequence. Most tools miss this.

fivedrisk tracks cumulative session state via a 16-state Markov chain over (data_exposure_tier × activity_risk_tier). When absorption probability into a dangerous state crosses 0.3, the next action is escalated to ORANGE. At 0.7, it's escalated to RED.

from fivedrisk.markov import MarkovDriftTracker, make_default_transition_matrix

tracker = MarkovDriftTracker(make_default_transition_matrix(), session_id="abc")
bump = tracker.record(scored_action)
if bump:
    print(f"Drift: {bump.reason}, escalated to {bump.escalated_band}")

Policy configuration

# policy.yaml
thresholds:
  red_threshold: 4
  orange_threshold: 3
  orange_score: 1.8
  yellow_score: 1.0

[floor]
# These rules block regardless of per-action score
- tool_name: "Bash"
  command_contains: "DROP TABLE"
  band: RED
  reason: "floor:no-destructive-sql"

Benchmark

python -m fivedrisk benchmark

Runs 39 offline attack scenarios across: prompt injection (12 categories), output leakage (8 categories), runtime tool misuse (19 scenarios). No external API calls. Deterministic. Safe to run in CI.


Performance

Measured on Apple M1, single-thread. Reproducible from benchmarks/bench_minimal.py.

Operation p50 p99
5D core (classify + score) 40µs 42µs
Injection scan, 30 char clean 11µs 12µs
Injection scan, 3000 char clean 669µs 685µs
Leakage scan, 200 char clean 23µs 23µs
5D + injection + leakage scan 64µs 65µs
5D + Markov drift 43µs 44µs
5D + SQLite audit-log write (I/O) 440µs 1ms
@gate sync overhead (incl. log write) 439µs 1.1ms
@gate async overhead 421µs 660µs

External optional layers (not in fivedrisk):

Injection scanner is linear in input length; for large RAG contexts, chunk and parallelize. Run the bench script on your target hardware for numbers that match your install.


Audit log

fivedrisk produces an append-only decision log entry for every agent action. Each entry records:

  • Risk band and rationale
  • Dimension scores (all 5 axes)
  • Model routing decision and approval history
  • Session drift state
  • Injection and leakage scan results
  • Optional agent identity claim (see below)

Agent identity passthrough

fivedrisk accepts opaque agent identity claims through Action.metadata["agent_identity"]. The string flows through unchanged into the audit log for SOC/SIEM correlation. SVID, JWT, and X.509 subject strings are supported as opaque data today.

action.metadata["agent_identity"] = "spiffe://example.org/agents/triage-bot"

Cryptographic validation, structured parsing, and identity-aware policy hooks are post-OSS scope.

Reserved metadata keys

  • agent_identity — opaque identity claim string. Do not overwrite with arbitrary values.

Identity capture

Action.acting_identity is a typed pass-through primitive for the principal an action is being taken on behalf of. Distinct from agent_identity (the AI agent's own workload identity); acting_identity is who authorized the action.

from fivedrisk import gate, ActingIdentity, PrincipalType, AttestationSource

ai = ActingIdentity(
    principal_id="user-42",
    principal_type=PrincipalType.USER,
    attestation_source=AttestationSource.HTTP_HEADER,
)

# Per-call override
fn("...", session_id="s1", _fivedrisk_acting_identity=ai)

Declare identity_required: true in policy.yaml to deny actions where the caller supplied no identity. The deny surfaces as IdentityRequiredError and emits an identity_required_denial NDJSON event.

Identity-aware policy evaluation beyond admission, cryptographic validation, and SPIFFE/SPIRE native binding are post-OSS scope.


Cost management primitives

Per-session token budgeting with direct DENY at @gate when a reservation would exceed the session cap.

# policy.yaml
max_session_budget_tokens: 100000
max_tool_call_budget_tokens: 4096
from fivedrisk import gate, configure

configure(event_path="audit.ndjson", default_model_class="claude-sonnet-class")

@gate(tool_name="summarize", estimated_input_tokens=2000)
def summarize(text: str, session_id: str) -> str:
    ...

If the projected token spend exceeds max_session_budget_tokens, @gate raises BudgetExceededError and emits a budget_intervention NDJSON event.

Additional Operational FinOps capabilities (Tool Manifest admission layers, useful-progress monitoring, multi-agent budget envelopes, wall-clock / retry / delegation caps) are on the project roadmap.


Planned

Future capability surfaces signalled here for search and contributor expectations. No commitment dates.

  • SPIFFE / MCP reference example (end-to-end workload identity demo)
  • MITRE ATLAS coverage with real tests
  • NIST AI RMF mapping
  • OWASP Agentic Top 10 coverage doc
  • Regulatory crosswalks (AI Act Article 12, NIS2, DORA, ISO 42001)
  • Decision log analysis cookbook (sample SQL for common compliance queries)

Architecture

fivedrisk/
├── schema.py        # Band, Action, ScoredAction, HITLCard, ModelClass
├── scorer.py        # score(), model routing (§12-19)
├── classifier.py    # classify_tool_call() with policy baselines
├── hooks.py         # @gate, Agent SDK hooks, injection/leakage scanners
├── drift.py         # SessionAccumulator (O(1) counter-based)
├── markov.py        # MarkovDriftTracker, Gauss-Jordan, absorption probs
├── detectors.py     # Versioned detector corpus (2026-04-14.2)
├── policy.py        # Policy dataclass + YAML loader
├── router.py        # ModelRouter, EscalationSignal
├── logger.py        # DecisionLog (SQLite, append-only)
├── langgraph_node.py# LangGraph integration
├── benchmarks.py    # 39-case offline benchmark harness
└── tests/           # 424 tests

Coverage: 14/21 governance spec sections fully implemented.


License

Apache 2.0. See LICENSE.

Built by Loren Angoni. Contributions welcome.

"An ambition that doesn't get executed is a hallucination."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fivedrisk-0.5.0.tar.gz (95.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fivedrisk-0.5.0-py3-none-any.whl (109.4 kB view details)

Uploaded Python 3

File details

Details for the file fivedrisk-0.5.0.tar.gz.

File metadata

  • Download URL: fivedrisk-0.5.0.tar.gz
  • Upload date:
  • Size: 95.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for fivedrisk-0.5.0.tar.gz
Algorithm Hash digest
SHA256 99ecf6e08e7bee97289eaeea12b754b3354d866a5aca9bfcbe81c619e21934b9
MD5 e818fb97c8963d844a976a8001ddba56
BLAKE2b-256 6e02891650ec0fe085d4ad6a58ce9c20f6641299e78b2c6ed83ddbffef08f5e5

See more details on using hashes here.

File details

Details for the file fivedrisk-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: fivedrisk-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 109.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for fivedrisk-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04277fdebc50effc6bc887e2fb89401279f8119c3962f6e426680306a2239d7b
MD5 0d05265bf4cd8646eec80a674d6fde57
BLAKE2b-256 973f15ed973c749e5424feb83a739d326bdbffe13a053adda7487f8c6dd34d0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page