Guardrail lifecycle for LLM agent tool calls — define, lint, enforce, test, measure, audit
Project description
frenum
Guardrail lifecycle for LLM agent tool calls. Define. Lint. Enforce. Test. Measure. Audit.
Named after the Latin word for bridle or restraint. The rein that keeps your LLM agent in check.
Why
LLM agents call tools autonomously — execute SQL, send emails, make API calls. Teams are shipping guardrails, but the lifecycle around them is fragmented: enforcement in one tool, testing cobbled together with pytest, audit as an afterthought, policy definitions scattered across code and config.
frenum puts the full guardrail lifecycle under one YAML schema:
policy.yaml → frenum lint → Engine.evaluate() → frenum test → coverage % → audit.jsonl
Define Lint Enforce Test Measure Audit
- YAML config — rules live in config files that compliance teams can review, version, and audit
- Zero-LLM enforcement — every decision is deterministic and reproducible
- Guardrail coverage — know exactly which rules are tested and which aren't
- Policy linting — catch broken regex and missing params before deployment
- Compliance-first — OPA-inspired audit trail with PII redaction
- Framework-agnostic — works standalone or with LangGraph
Quick Start
pip install frenum[yaml]
from frenum import Engine, ToolCall
engine = Engine.from_yaml("policy.yaml")
result = engine.evaluate(
ToolCall(name="execute_sql", args={"query": "DROP TABLE users"})
)
print(result.decision) # Decision.BLOCK
print(result.reason) # "Pattern matched in 'query': DROP TABLE"
Scaffold a new project
frenum init
Creates a starter policy.yaml and tests.yaml in the current directory. Lint and test immediately:
frenum lint --config policy.yaml
frenum test --config policy.yaml --tests tests.yaml
CLI
Regression Testing
frenum test --config policy.yaml --tests tests/ --format text
frenum — guardrail regression test report
==================================================
Results: 5/5 passed, 0 failed
[PASS] SQL injection blocked
[PASS] Clean query allowed
[PASS] PII in email body blocked
[PASS] Admin can call any tool
[PASS] Analyst blocked from execute_sql
Coverage: 100.0% (4/4 deterministic rules)
Semantic (manual validation required): tone_check
Evidence hash: a3f8c1d2e5b7...
Exit code 0 = all pass, 1 = failures. CI-ready.
Coverage Threshold
Fail CI if coverage drops below a minimum:
frenum test --config policy.yaml --tests tests/ --min-coverage 80
Exit code 1 if tests pass but coverage is below the threshold.
Policy Linting
frenum lint --config policy.yaml
ERROR E001 [bad_regex]: Invalid regex pattern '[a-z': unterminated character set
ERROR E002 [pii_scan]: Unknown PII detector: 'passport'
WARN W002 [incomplete]: Missing required parameter 'patterns' for rule type 'regex_block'
2 error(s), 1 warning(s)
Exit code 0 = clean, 1 = errors found.
YAML Config
policy_version: "1.0.0"
rules:
# Block dangerous SQL patterns
- name: block_sql_injection
type: regex_block
applies_to: ["execute_sql", "run_query"]
params:
fields: ["query"]
patterns:
- "(?i)(DROP|DELETE|TRUNCATE)\\s+TABLE"
# Require confirmation IDs on sensitive operations
- name: require_confirmation
type: regex_require
applies_to: ["send_email", "transfer_funds"]
params:
fields: ["confirmation_id"]
pattern: "^CONF-[A-Z0-9]{8}$"
# Scan all tool calls for PII leakage
- name: detect_pii
type: pii_detect
applies_to: ["*"]
params:
detectors: [email, phone_intl, hk_id, credit_card, ssn]
action: block
# Role-based tool access
- name: tool_entitlement
type: entitlement
applies_to: ["*"]
params:
roles:
analyst: ["search", "get_data", "summarize"]
admin: ["*"]
default: block
# Cost threshold
- name: budget_limit
type: budget
applies_to: ["*"]
params:
max_cost: 10.0
cost_field: estimated_cost
# Tool allowlist
- name: allowed_tools_only
type: tool_allowlist
applies_to: ["*"]
params:
allowed_tools: ["search", "get_data", "summarize", "execute_sql"]
# Semantic rules are tracked but not enforced in CI
- name: tone_check
type: regex_block
kind: semantic
applies_to: ["*"]
params:
fields: ["response"]
patterns: ["placeholder"]
Test Cases
tests:
- description: SQL injection blocked
tool_call:
name: execute_sql
args:
query: "DROP TABLE users"
expected: block
expected_rule: block_sql_injection
- description: Clean query allowed
tool_call:
name: execute_sql
args:
query: "SELECT * FROM users WHERE id = 1"
expected: allow
- description: PII in email body blocked
tool_call:
name: send_email
args:
body: "Customer HKID is A123456(7)"
expected: block
Rule Types
| Type | Purpose | Key Params |
|---|---|---|
regex_block |
Block if field matches pattern | fields, patterns |
regex_require |
Block if required field is missing/invalid | fields, pattern |
pii_detect |
Scan args for PII (email, phone, HKID, credit card, SSN) | detectors, action |
entitlement |
Role-based tool access control | roles, default |
budget |
Block if estimated cost exceeds threshold | max_cost, cost_field |
tool_allowlist |
Block tools not in allowed list | allowed_tools |
Guardrail Coverage
guardrail coverage = rules_exercised / total_deterministic_rules
Rules tagged kind: semantic are excluded from the denominator and listed as "manual validation required" in every report. Honest boundaries over inflated numbers.
engine = Engine.from_yaml("policy.yaml")
results = engine.run_tests(test_cases)
coverage = engine.calculate_coverage(results)
print(f"Coverage: {coverage.coverage_pct}%")
print(f"Not exercised: {coverage.rules_not_exercised}")
print(f"Semantic (manual): {coverage.semantic_rules}")
Policy Linting
Catch config errors before deployment:
| Code | Severity | What it catches |
|---|---|---|
| E001 | Error | Invalid regex pattern |
| E002 | Error | Unknown PII detector name |
| E003 | Error | Duplicate rule names |
| W001 | Warning | Empty applies_to (rule will never match) |
| W002 | Warning | Missing required parameters for rule type |
| W003 | Warning | Unknown rule type |
from frenum import lint_policy
warnings = lint_policy(engine.rules)
for w in warnings:
print(f"{w.severity.upper()} {w.code} [{w.rule_name}]: {w.message}")
Audit Trail
Every evaluation produces a structured JSONL record with PII redaction:
from frenum import AuditLogger, Engine
logger = AuditLogger("audit.jsonl", redact_args=True)
engine = Engine.from_yaml("policy.yaml", audit_logger=logger.log)
Each record includes: decision_id, timestamp, policy_version, tool_name, tool_args (redacted), decision, rules_evaluated, blocking_rule, human_override, trace_id.
Audit Reports
from frenum import AuditReporter
reporter = AuditReporter("audit.jsonl")
report = reporter.generate()
print(report.to_text())
========================================
FRENUM AUDIT REPORT
========================================
Total evaluations: 500
Allow: 450 (90.0%)
Block: 50 (10.0%)
Top blocked tools:
1. execute_sql — 25 blocks
2. send_email — 15 blocks
Human override rate: 4.0% (2 of 50 blocks overridden)
========================================
Reports
Test reports in three formats:
frenum test --config policy.yaml --tests tests/ --format json --output report.json
frenum test --config policy.yaml --tests tests/ --format html --output report.html
HTML reports include a coverage bar, pass/fail matrix, and SHA-256 evidence hashing for tamper-evidence. Install frenum[html] for Jinja2 templates; stdlib fallback works without it.
LangGraph Integration
pip install frenum[langgraph]
from langgraph.prebuilt import ToolNode
from frenum import Engine
from frenum.adapters.langgraph import guarded_tool_node
tools = [search, calculator]
engine = Engine.from_yaml("policy.yaml")
safe_tools = guarded_tool_node(ToolNode(tools), engine)
builder.add_node("tools", safe_tools)
Blocked tool calls return a ToolMessage with the block reason — the LLM sees why its call was rejected and can adjust. Each tool call in a multi-call message is evaluated independently.
Programmatic Use (No YAML)
from frenum import Engine, RuleConfig, ToolCall
engine = Engine(rules=[
RuleConfig(
name="block_drops",
rule_type="regex_block",
params={"fields": ["query"], "patterns": [r"(?i)DROP\s+TABLE"]},
applies_to=["execute_sql"],
),
])
result = engine.evaluate(ToolCall(name="execute_sql", args={"query": "SELECT 1"}))
assert result.decision.value == "allow"
Zero dependencies — the core engine runs on stdlib alone. YAML loading, HTML reports, and LangGraph are optional extras.
Design Philosophy
- Config is reviewable. Compliance teams review YAML, not Python.
- No LLM in the enforcement path. Every decision is deterministic and reproducible.
- First BLOCK wins. Short-circuit evaluation matches firewall semantics that security teams already know.
- Honest about limits. Semantic rules can't be tested deterministically, so they're carved out explicitly.
- Lint before deploy. Catch policy config errors at authoring time, not at runtime.
- Audit everything. Every decision is logged with enough context to investigate, but matched values are redacted.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file frenum-0.3.0.tar.gz.
File metadata
- Download URL: frenum-0.3.0.tar.gz
- Upload date:
- Size: 108.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22631eb42242530a689dc31dfc308268cc5ff8f9e56f687f066f79e7ee67f0c9
|
|
| MD5 |
15c5c6a4c22ed3ed0ef2047f278918c6
|
|
| BLAKE2b-256 |
21894051b84d682d2ce8764254d1de6e5b4e70d2da1120885b6329a35778b9be
|
File details
Details for the file frenum-0.3.0-py3-none-any.whl.
File metadata
- Download URL: frenum-0.3.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86e7b1f4245dfc58c323b1717f102a0b6bff8dc4d4589e5020c0cbdcbb2c5c6c
|
|
| MD5 |
01a6aba07311caeed1c1e1a0a60a1dd6
|
|
| BLAKE2b-256 |
37682babba97e59924649f7541e76a66c9837d5cfc1e994f88cfcfc18ff3c435
|