Per-action AI agent risk scoring and governance: 5-dimension scoring, HITL enforcement, Markov drift detection, injection scanning, and auditable decision logs.

These details have not been verified by PyPI

Project links

Project description

fivedrisk — AI Agent Risk Governance Engine

fivedrisk is the fast deterministic policy gate that runs before your LLM-based safety stack.

Every AI agent action is scored on five risk dimensions, banded GREEN / YELLOW / ORANGE / RED, and resolved in 0.2 to 2.9 ms on a single CPU thread. No LLM in the decision path. No external service. No hyperscaler dependency. Apache 2.0. Built in Vienna, Austria. Architecturally sovereign: no external services, no hyperscaler dependency, runs entirely on your own infrastructure.

The two-stage gate

Sequence-aware deterministic policy resolves the obvious GREEN and RED in microseconds. LLM-based scanners (LLM Guard, Lakera, Pangea) run 100 to 700 ms per check and are reserved for YELLOW and ORANGE escalation where semantic judgment actually earns its cost. In typical deployments, fivedrisk takes 90%+ of the action volume off the LLM-based scanners.

agent action
    │
    ▼
[fivedrisk]  ← 0.2 to 2.9 ms, deterministic, audited
    │
    ├── GREEN  ─────────────► execute
    ├── YELLOW ─► LLM scanner ──► execute / log / escalate  (100–700 ms only when needed)
    ├── ORANGE ─► HITL ────────► approve / deny
    └── RED    ─────────────► block, audit, alert

What fivedrisk is

A runtime action-governance layer for AI agents. Per-action 5D scoring, HITL escalation, append-only audit log, 16-state Markov SafetyDrift for compositional attacks, identity passthrough, NDJSON event stream for SIEM. Think OPA for AI agents.

What fivedrisk is not

A general LLM guardrail suite. A semantic content scanner. A replacement for LLM Guard, Lakera, or Pangea. A replacement for best practices in AI governance (tool and scope narrowing, prompt guardrails, system prompts). It is the deterministic pre-filter that lets those scanners and practices scale.

Quickstart in 5 minutes

pip install fivedrisk
python -c "import fivedrisk; print(fivedrisk.__version__)"

Full walkthrough including scope-narrowing guidance and per-deployment tuning: docs/quickstart.md. Copy-paste-runnable integrations: examples/. Policy presets for common deployment archetypes: fivedrisk/policies/presets/.

from fivedrisk.hooks import gate

@gate(tool_name="write_to_database", autonomy_context=2)
async def write_record(table: str, data: dict) -> None:
    ...  # only executes if 5D scores GREEN or YELLOW
         # ORANGE → human approval required
         # RED    → blocked, never runs

What it does

fivedrisk scores every AI agent action on 5 risk dimensions (0–4 each):

Dimension	What it measures
D — Data Sensitivity	Public → PII → financial → credentials
T — Tool Privilege	Read-only → write → admin → destructive
R — Reversibility	Undoable → hard-to-undo → irreversible
E — External Impact	Local → internal API → external → untrusted
A — Autonomy Context	User-direct → agent-supervised → fully autonomous

5D stays deterministic. Bands signal what fivedrisk wants your stack to do; your stack chooses the LLM and the workflow.

Default 3-band:

GREEN — execute, normal logging.
ORANGE — HITL approval required. fivedrisk signals; your stack handles the LLM choice. No auto model promotion.
RED — blocked.

Opt-in 4-band compliance mode (enable_yellow_band: true in policy.yaml): surfaces YELLOW as a stable moderate-risk tier for audit queries and dashboards. Optional model escalation within YELLOW via yellow_model_escalation: true.

fivedrisk is one layer in a defence-in-depth AI governance stack.

Features

5D scoring engine — deterministic, ~40µs per action (p50) on M1, no LLM calls
Markov SafetyDrift — 16-state Markov chain detects cumulative risk across action sequences; catches compositional attacks that individual scoring misses
Session accumulator — O(1) counter-based drift tracking for the common case
Injection scanner — 24+ regex patterns covering GPT-5/Opus-era evasion (Base64, zero-width Unicode, role hijacks, encoded exec calls)
Output leakage scanner — PII, credentials, crypto keys, injection-echo detection
@gate decorator — wrap any sync or async function with full 5D gating
Agent SDK hooks — fivedrisk_pre_tool / fivedrisk_post_tool for Anthropic Agent SDK
LangGraph node — drop-in integration for LangGraph pipelines
Destination policy — allowlist/denylist for outbound endpoints
Audit log — append-only SQLite decision log plus optional NDJSON event stream for SIEM delivery
Policy floor enforcement — floor rules in policy.yaml cannot be overridden at runtime
Defence-in-depth test suite — pytest markers per OWASP LLM Top 10 category, plus 39-scenario attack-class benchmark (python -m fivedrisk benchmark)
424 tests with 0 failures

Install

pip install fivedrisk

For LangGraph integration:

pip install "fivedrisk[langgraph]"

Quick start (30 seconds)

from fivedrisk import classify_tool_call, score, load_policy, Band

policy = load_policy("policy.yaml")  # or use defaults
action = classify_tool_call("Bash", {"command": "rm -rf /tmp/cache"}, policy)
result = score(action, policy)

print(result.band)                      # Band.ORANGE
print(result.rationale)  # "ORANGE — Bash: Reversibility=3 (≥ ORANGE threshold 3)"
print(result.routing)    # RoutingDecision(model_floor=M3, approval_required=True)

With the @gate decorator:

from fivedrisk.hooks import gate, configure
from fivedrisk import load_policy

configure(policy=load_policy("policy.yaml"))

@gate(tool_name="send_email", autonomy_context=1)
def send_email(to: str, body: str) -> None:
    # only executes if 5D scores GREEN or YELLOW
    smtp.send(to, body)

With Anthropic Agent SDK:

from fivedrisk.hooks import fivedrisk_pre_tool, fivedrisk_post_tool

# Register as PreToolUse and PostToolUse hooks in your agent

With LangGraph:

from fivedrisk.langgraph_node import fivedrisk_gate_node
# Add fivedrisk_gate_node to your StateGraph before any tool-executing node

SafetyDrift — why sequence risk matters

A single READ of a config file scores GREEN. But 10 GREENs followed by a write to an external API using credentials extracted two steps earlier is a RED sequence. Most tools miss this.

fivedrisk tracks cumulative session state via a 16-state Markov chain over (data_exposure_tier × activity_risk_tier). When absorption probability into a dangerous state crosses 0.3, the next action is escalated to ORANGE. At 0.7, it's escalated to RED.

from fivedrisk.markov import MarkovDriftTracker, make_default_transition_matrix

tracker = MarkovDriftTracker(make_default_transition_matrix(), session_id="abc")
bump = tracker.record(scored_action)
if bump:
    print(f"Drift: {bump.reason}, escalated to {bump.escalated_band}")

Policy configuration

# policy.yaml
thresholds:
  red_threshold: 4
  orange_threshold: 3
  orange_score: 1.8
  yellow_score: 1.0

[floor]
# These rules block regardless of per-action score
- tool_name: "Bash"
  command_contains: "DROP TABLE"
  band: RED
  reason: "floor:no-destructive-sql"

Benchmark

python -m fivedrisk benchmark

Runs 39 offline attack scenarios across: prompt injection (12 categories), output leakage (8 categories), runtime tool misuse (19 scenarios). No external API calls. Deterministic. Safe to run in CI.

Performance

Measured on Apple M1, single-thread. Reproducible from benchmarks/bench_minimal.py.

Operation	p50	p99
5D core (classify + score)	40µs	42µs
Injection scan, 30 char clean	11µs	12µs
Injection scan, 3000 char clean	669µs	685µs
Leakage scan, 200 char clean	23µs	23µs
5D + injection + leakage scan	64µs	65µs
5D + Markov drift	43µs	44µs
5D + SQLite audit-log write (I/O)	440µs	1ms
`@gate` sync overhead (incl. log write)	439µs	1.1ms
`@gate` async overhead	421µs	660µs

External optional layers (not in fivedrisk):

PromptArmor: ~20-30ms typical
LLM Guard (token scanners): ~10-15ms typical
LLM Guard (ML scanners): ~50-200ms typical

Injection scanner is linear in input length; for large RAG contexts, chunk and parallelize. Run the bench script on your target hardware for numbers that match your install.

Audit log

fivedrisk produces an append-only decision log entry for every agent action. Each entry records:

Risk band and rationale
Dimension scores (all 5 axes)
Model routing decision and approval history
Session drift state
Injection and leakage scan results
Optional agent identity claim (see below)

Agent identity passthrough

fivedrisk accepts opaque agent identity claims through Action.metadata["agent_identity"]. The string flows through unchanged into the audit log for SOC/SIEM correlation. SVID, JWT, and X.509 subject strings are supported as opaque data today.

action.metadata["agent_identity"] = "spiffe://example.org/agents/triage-bot"

Cryptographic validation, structured parsing, and identity-aware policy hooks are post-OSS scope.

Reserved metadata keys

agent_identity — opaque identity claim string. Do not overwrite with arbitrary values.

Identity capture

Action.acting_identity is a typed pass-through primitive for the principal an action is being taken on behalf of. Distinct from agent_identity (the AI agent's own workload identity); acting_identity is who authorized the action.

from fivedrisk import gate, ActingIdentity, PrincipalType, AttestationSource

ai = ActingIdentity(
    principal_id="user-42",
    principal_type=PrincipalType.USER,
    attestation_source=AttestationSource.HTTP_HEADER,
)

# Per-call override
fn("...", session_id="s1", _fivedrisk_acting_identity=ai)

Declare identity_required: true in policy.yaml to deny actions where the caller supplied no identity. The deny surfaces as IdentityRequiredError and emits an identity_required_denial NDJSON event.

Identity-aware policy evaluation beyond admission, cryptographic validation, and SPIFFE/SPIRE native binding are post-OSS scope.

Cost management primitives

Per-session token budgeting with direct DENY at @gate when a reservation would exceed the session cap.

# policy.yaml
max_session_budget_tokens: 100000
max_tool_call_budget_tokens: 4096

from fivedrisk import gate, configure

configure(event_path="audit.ndjson", default_model_class="claude-sonnet-class")

@gate(tool_name="summarize", estimated_input_tokens=2000)
def summarize(text: str, session_id: str) -> str:
    ...

If the projected token spend exceeds max_session_budget_tokens, @gate raises BudgetExceededError and emits a budget_intervention NDJSON event.

Additional Operational FinOps capabilities (Tool Manifest admission layers, useful-progress monitoring, multi-agent budget envelopes, wall-clock / retry / delegation caps) are on the project roadmap.

Planned

Future capability surfaces signalled here for search and contributor expectations. No commitment dates.

SPIFFE / MCP reference example (end-to-end workload identity demo)
MITRE ATLAS coverage with real tests
NIST AI RMF mapping
OWASP Agentic Top 10 coverage doc
Regulatory crosswalks (AI Act Article 12, NIS2, DORA, ISO 42001)
Decision log analysis cookbook (sample SQL for common compliance queries)

Architecture

fivedrisk/
├── schema.py        # Band, Action, ScoredAction, HITLCard, ModelClass
├── scorer.py        # score(), model routing (§12-19)
├── classifier.py    # classify_tool_call() with policy baselines
├── hooks.py         # @gate, Agent SDK hooks, injection/leakage scanners
├── drift.py         # SessionAccumulator (O(1) counter-based)
├── markov.py        # MarkovDriftTracker, Gauss-Jordan, absorption probs
├── detectors.py     # Versioned detector corpus (2026-04-14.2)
├── policy.py        # Policy dataclass + YAML loader
├── router.py        # ModelRouter, EscalationSignal
├── logger.py        # DecisionLog (SQLite, append-only)
├── langgraph_node.py# LangGraph integration
├── benchmarks.py    # 39-case offline benchmark harness
└── tests/           # 424 tests

Coverage: 14/21 governance spec sections fully implemented.

License

Apache 2.0. See LICENSE.

Built by Loren Angoni. Contributions welcome.

"An ambition that doesn't get executed is a hallucination."

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.3

Jun 20, 2026

0.5.2

May 23, 2026

0.5.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fivedrisk-0.5.3.tar.gz (108.9 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fivedrisk-0.5.3-py3-none-any.whl (124.6 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file fivedrisk-0.5.3.tar.gz.

File metadata

Download URL: fivedrisk-0.5.3.tar.gz
Upload date: Jun 20, 2026
Size: 108.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for fivedrisk-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`a5e4675c4156ad0e1c616a970c38d9e74b8cbb9878373ed41e708ec74c357744`
MD5	`a7ce389daaf5f1d2c49a48c10f484e65`
BLAKE2b-256	`4dd223a35b477fdc902c04effcaa3bee4e63aeba8b6ac4177127077cc9eb7220`

See more details on using hashes here.

File details

Details for the file fivedrisk-0.5.3-py3-none-any.whl.

File metadata

Download URL: fivedrisk-0.5.3-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 124.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for fivedrisk-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29a55d05551202ad2d99b1da796e9a0ebf84fad181392c048cc7de3712f795a3`
MD5	`fd91ecdaff104aaf3ab3e6db198e80f2`
BLAKE2b-256	`68b1f515428df00bc370418d52d6fd5b7691af4ae74ac3d117463793be416191`

See more details on using hashes here.

fivedrisk 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fivedrisk — AI Agent Risk Governance Engine

The two-stage gate

What fivedrisk is

What fivedrisk is not

Quickstart in 5 minutes

What it does

Features

Install

Quick start (30 seconds)

SafetyDrift — why sequence risk matters

Policy configuration

Benchmark

Performance

Audit log

Agent identity passthrough

Reserved metadata keys

Identity capture

Cost management primitives

Planned

Architecture

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes