Skip to main content

A layered protocol and reference implementation for codifying risk in autonomous agent actions.

Project description

agent-risk-engine

A layered protocol and reference implementation for codifying risk in autonomous agent actions.

See PROTOCOL.md for the language-agnostic protocol specification.

Installation

uv

uv add agent-risk-engine

pip

pip install agent-risk-engine

Quick Start

from agent_risk_engine import RuleGate, RiskEvaluator, Action, GateResult

gate = RuleGate(threshold="cautious")
evaluator = RiskEvaluator(rule_gate=gate)

result = await evaluator.evaluate(Action(kind="tool_call", name="read_file", risk=1))
assert result.decision == GateResult.ALLOWED

result = await evaluator.evaluate(
    Action(kind="tool_call", name="execute_shell", parameters={"command": "rm -rf /"}, risk=5)
)
assert result.decision == GateResult.NEEDS_APPROVAL

Architecture

Actions pass through a 3-layer pipeline:

flowchart LR
    A["Action arrives"] --> B

    B["**RuleGate** · L1\nFast static rules\nNo LLM · Microseconds"]
    B -->|denied| Z["DENIED"]
    B -->|passes| C

    C["**ActionAnalyzer** · L2\nArgument-aware scoring\nPassthrough stub by default"]
    C -->|scored| D

    D["**ActionGate** · L3\nRisk vs utility tradeoff\nOnly escalates, never relaxes"]
    D --> E["ALLOWED / NEEDS_APPROVAL / DENIED"]

L1 (RuleGate) and the RiskUtilityGate implementation of L3 are fully implemented. L2 ships as a passthrough stub — plug in your own ActionAnalyzer.

Risk Levels

Level Label Meaning
1 Info Read-only, no side effects
2 Low Reads potentially sensitive data
3 Moderate Reversible mutations
4 High Hard-to-reverse mutations
5 Critical Destructive or irreversible

RuleGate

Fast, deterministic, no LLM required. Supports per-kind threshold routing:

gate = RuleGate(
    threshold="cautious",
    kind_thresholds={
        "tool_call": "standard",
        "file_write": 2,
        "code_execution": 1,
    },
    denied={"delete_database"},
    allowed={"read_logs"},
)

Evaluation order: deniedallowedapprove → threshold comparison.

Threshold Aliases

Alias Level Description
read-only 1 Only info-level actions
cautious 2 Info + low-risk actions
standard 3 Up to reversible mutations
full-trust 5 Everything auto-allowed

PatternAnalyzer

Scores actions by matching regex patterns against serialized parameters. Supports kind-scoped patterns:

from agent_risk_engine import PatternAnalyzer, RiskPattern

analyzer = PatternAnalyzer(extra_patterns=[
    RiskPattern(r"\bDROP\b", 5, "SQL drop", kinds=frozenset({"database_query"})),
])

Pass it to RiskEvaluator(rule_gate=gate, action_analyzer=analyzer).

RiskUtilityGate

Weighs risk against caller-provided utility. Utility is an input, not computed internally — the library evaluates risk; your framework understands agent goals.

evaluator = RiskEvaluator(
    rule_gate=RuleGate(threshold="standard"),
    action_gate=RiskUtilityGate(),
)

result = await evaluator.evaluate(
    Action(kind="tool_call", name="write_file", risk=3),
    utility=UtilityScore(level=4, reasoning="User explicitly requested"),
)

The gate only escalates, never relaxes — it cannot make a decision less restrictive than L1.

Extending with Custom Analyzers

ActionAnalyzer is a Protocol. Implement analyze(action: Action) -> RiskScore:

from agent_risk_engine import RiskEvaluator, RuleGate, RiskScore, Action

class LLMAnalyzer:
    """Use an LLM to evaluate the actual risk of action arguments."""

    async def analyze(self, action: Action) -> RiskScore:
        # Inspect action.parameters, reason about consequences
        assessed_level = await my_llm_judge(action.name, action.parameters)
        return RiskScore(level=assessed_level, reasoning="LLM analysis")

evaluator = RiskEvaluator(
    rule_gate=RuleGate(threshold="cautious"),
    action_analyzer=LLMAnalyzer(),
)

CallTracker

Standalone loop and repetition detection. Not a pipeline layer — use it to build context before evaluating:

from agent_risk_engine import CallTracker

tracker = CallTracker()
tracker.record(action.name)
context = tracker.check()
# Merge into action metadata before evaluating
action = Action(kind=action.kind, name=action.name, risk=action.risk, metadata=context)

check() returns {"healthy": bool, "warnings": list[str]}.

Framework Integration

Write a thin adapter that maps your framework's action primitives to Action:

async def before_action_hook(action_name, args, action_risk):
    action = Action(kind="tool_call", name=action_name, parameters=args, risk=action_risk)
    result = await evaluator.evaluate(action)

    if result.decision == GateResult.DENIED:
        raise PermissionError(result.reasoning)
    if result.decision == GateResult.NEEDS_APPROVAL:
        approved = await prompt_user(f"Allow '{action_name}'? Risk: {result.risk_score.level}/5")
        if not approved:
            raise PermissionError("User denied")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_risk_engine-0.2.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_risk_engine-0.2.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_risk_engine-0.2.0.tar.gz.

File metadata

  • Download URL: agent_risk_engine-0.2.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_risk_engine-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a44f88363dc422293a2bc3d0a5c1e691e5e3171e898e916110a00d4ae2bd508e
MD5 4949e38457f21445c922860aafcaca92
BLAKE2b-256 d44a0dbdaa93504b5d1fb04d1ecfe0b6beb90b01e9ea7371b7a0edb1cde9dc40

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_risk_engine-0.2.0.tar.gz:

Publisher: publish.yml on willdah/agent-risk-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_risk_engine-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_risk_engine-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 230a8b4f9a2b6e9a3ec36111adef234b54a6a03d464dc10d8cebea91a4647bef
MD5 f188d942779e8a0255229ce174a309f3
BLAKE2b-256 2cf2c91c563e228a5be68a9526927571f958833f942e03d80b415f262d1a4e39

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_risk_engine-0.2.0-py3-none-any.whl:

Publisher: publish.yml on willdah/agent-risk-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page