Deterministic validation layer for AI agents and autonomous systems
Project description
agentguard-trustlayer
AgentGuard-TrustLayer is a runtime safety layer that prevents AI agents from taking invalid or unsafe actions — and audits whether the safety rules themselves have drifted.
Why this exists
AI agents can generate actions. But they don't understand consequences.
Without a validation layer:
- they can break invariants
- corrupt system state
- execute invalid operations
agentguard-trustlayer sits between AI and execution. It ensures every action is checked, every rule is enforced, and every failure is contained.
The harder problem: who guards the guardian?
In self-evolving agent systems, the constraint set itself can drift toward permissiveness over time — rules get removed, thresholds weakened, bypasses accumulate. v2.1 adds ConstraintAudit to track this: the safety layer now audits itself using the same SHA-256 chain mechanism it uses for action validation.
Core Idea
AI Agent --> Proposal --> TrustLayer --> Execution
^
Constraints
^
ConstraintAudit
(are the rules still intact?)
Every update passes through four gates:
- Auth — is the token valid and unexpired?
- Locks — is the target key frozen?
- Constraints — does the new state pass all rules?
- Rollback — if anything fails, state is fully restored
And now a fifth, ongoing check:
- Constraint drift — has the rule set drifted from its original baseline?
Features
- Constraint-based validation with composable logic (
&,|,~) - Delta-aware constraints — rules can compare proposed vs original state
- Authenticated authority (HMAC-signed tokens with TTL)
- Safe state updates with automatic rollback
set,increment, andupdateaction types- Async agent loop with retry, backoff, and error feedback to model
- Tamper-evident audit chain — every
ValidationEventcarries a SHA-256 hash linked to the previous event - Constraint drift tracking —
ConstraintAudithashes and chains the constraint set, detects permissive drift GuardedAgenthigh-level API — one object, one call- Zero dependencies (standard library only)
Install
pip install trustlayer-py
Quick Start
import asyncio, json
from trustlayer import GuardedAgent, LambdaConstraint
async def my_model(prompt: str) -> str:
return json.dumps({"type": "set", "target": "score", "value": 75})
agent = GuardedAgent(
model=my_model,
rules=[LambdaConstraint("score 0-100", lambda v: 0 <= v.get("score", 0) <= 100)],
initial_state={"score": 50},
)
result = asyncio.run(agent.run("raise the score"))
print(result)
# {'status': 'success', 'state': {'score': 75}, 'audit': '<sha256>'}
Constraint Drift — auditing the guardian
In long-running or self-evolving agent systems, the rules themselves can change. ConstraintAudit tracks those changes with the same tamper-evident chain used for action validation.
How it works
Every time constraints are recorded, the names and structure are hashed and chained to the previous state. Drift is measured against the original baseline.
from trustlayer import GuardedAgent, LambdaConstraint
rules = [
LambdaConstraint("budget cap", lambda v: v["spend"] <= 100),
LambdaConstraint("no self-modify", lambda v: not v["modifying_rules"]),
]
agent = GuardedAgent(
model=my_model,
rules=rules,
initial_state={"spend": 0, "modifying_rules": False},
)
# Baseline — no drift
print(agent.constraint_drift())
# {
# "divergence_from_baseline": 0.0,
# "trend": "stable",
# "baseline_count": 2,
# "current_count": 2,
# "removed_constraints": [],
# "added_constraints": [],
# "snapshots": 1,
# "unchanged": True
# }
# Evolve the rules — remove a constraint
agent.update_rules([rules[0]])
print(agent.constraint_drift())
# {
# "divergence_from_baseline": 0.5,
# "trend": "permissive_drift",
# "baseline_count": 2,
# "current_count": 1,
# "removed_constraints": ["no self-modify"],
# "added_constraints": [],
# "snapshots": 2,
# "unchanged": False
# }
Drift states
| trend | meaning |
|---|---|
stable |
Constraint set unchanged from baseline |
changed |
Rules added or renamed, no net loss |
permissive_drift |
Constraints removed — the system is less safe than at baseline |
Using ConstraintAudit directly
from trustlayer import ConstraintAudit, LambdaConstraint
rules = [LambdaConstraint("rule_a", lambda v: v["x"] < 10)]
audit = ConstraintAudit(rules)
# later, after rules change
audit.record(rules, label="after-update")
print(audit.drift())
print(audit.history()) # full snapshot chain, oldest first
With the low-level Validator
from trustlayer import Validator, State, LambdaConstraint
import secrets
rules = [LambdaConstraint("cap", lambda v: v["n"] < 5)]
state = State({"n": 0})
validator = Validator(state, rules, secret=secrets.token_bytes(32))
new_rules = [
LambdaConstraint("cap", lambda v: v["n"] < 5),
LambdaConstraint("floor", lambda v: v["n"] >= 0),
]
validator.update_constraints(new_rules, label="added floor")
print(validator.constraint_drift())
Try to break the agent
git clone https://github.com/AILIFE1/agentguard-trustlayer
cd agentguard-trustlayer
python examples/demo_break_the_agent.py
An agent tries to set balance = 1,000,000. TrustLayer blocks it. The error feeds back into the prompt. The agent self-corrects.
[MODEL OUTPUT] Attempting INVALID action...
[MODEL INPUT] Increase balance as much as possible | Last error: balance <= max_limit
[MODEL OUTPUT] Attempting SAFE action...
FINAL STATE: {'balance': 110, 'max_limit': 200}
RESULT: [OK] Increase balance as much as possible
Full API example
import asyncio, json
from trustlayer import (
Agent, AuthorityLevel, AuthToken, Cathedral,
LambdaConstraint, RetryConfig, State, Validator,
)
SECRET = b"my-secret"
score_ok = LambdaConstraint("score_ok", lambda v: 0 <= v.get("score", 0) <= 100)
state = State(values={"score": 50})
validator = Validator(state, [score_ok], SECRET)
token = AuthToken.issue(AuthorityLevel.SYSTEM, "agent", 60, SECRET)
async def model(prompt: str) -> str:
return json.dumps({"type": "set", "target": "score", "value": 75})
async def main():
cathedral = Cathedral(validator, Agent(model), retry=RetryConfig(max_attempts=3))
event = await cathedral.step("raise the score", token)
print(event) # [OK] raise the score
print(event.audit_hash) # sha256 chain link
print(validator.constraint_drift()) # drift from baseline
asyncio.run(main())
Project Structure
agentguard-trustlayer/
├── trustlayer/
│ ├── __init__.py # Public API
│ ├── auth.py # AuthToken, AuthorityLevel
│ ├── constraints.py # Constraint, LambdaConstraint, And/Or/Not
│ ├── constraint_audit.py # ConstraintAudit — drift tracking for the rules
│ ├── types.py # State, Action, Update
│ ├── validator.py # Validator, ValidationEvent, audit chain
│ └── engine.py # Agent, Cathedral, GuardedAgent, RetryConfig
└── examples/
├── demo.py
└── demo_break_the_agent.py
Used with Cathedral
Cathedral provides persistent memory and identity drift tracking for AI agents. AgentGuard provides the action validation layer. Together:
- Cathedral tracks agent identity drift — has the agent changed from what it was?
- AgentGuard tracks constraint drift — have the rules governing the agent changed?
Neither knows about the other. They compose cleanly.
Cathedral Nexus (orchestrator)
├── Cathedral API — who to trust (identity + memory drift)
└── AgentGuard — what actions are allowed (constraint drift)
Cathedral Nexus is a reference implementation of this architecture.
Philosophy
agentguard-trustlayer doesn't make decisions — it decides whether decisions are allowed.
And now it checks whether the rules for what's allowed have themselves been tampered with.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trustlayer_py-3.3.0.tar.gz.
File metadata
- Download URL: trustlayer_py-3.3.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2539c149466fca7d986aaa15cb7765099d2ac59ea07cc762856c94fc2baf1467
|
|
| MD5 |
0f3b46f88878c683fe8bf318d5e1892c
|
|
| BLAKE2b-256 |
79ba7005afb70b452117dbc17a0e24d8ad4337d6b05d8b27389a0c3ebdc0db72
|
File details
Details for the file trustlayer_py-3.3.0-py3-none-any.whl.
File metadata
- Download URL: trustlayer_py-3.3.0-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32de887a3bb8b8b41cda1147e1857381a6fc412529c2c9aae1894ec5dc8be7e5
|
|
| MD5 |
152a5b6a0846326616f7c479e9ed8d28
|
|
| BLAKE2b-256 |
5cb6bd0d3f0bdff235e7ab2c6b6b6274cf1df1e8ee2915e5d32fcc50b4ea67cd
|