Production-grade safety boundaries for AI agents - policies, tracing, replay, and human-in-the-loop approval
Project description
agent-safety-layer
Production-grade safety boundaries for AI agents — policies, runtime limits, execution tracing, replay, and human-in-the-loop approval.
Why?
AI agents can do dangerous things — delete files, drop databases, send emails, make API calls. This library provides guardrails:
- Block dangerous operations before they execute
- Sandbox file and network access to allowed paths/hosts
- Trace everything for debugging and auditing
- Replay sessions to test policy changes safely
- Human approval gates for sensitive operations
Installation
pip install agent-safety-layer
Quick Start
from agent_safety_layer import SafetyLayer, Policy, PathBoundary, NetworkBoundary
# Create a safety layer with policies and boundaries
safety = SafetyLayer(
policies=[
# Block dangerous shell commands
Policy.block_pattern(r"rm\s+-rf\s+/", "No recursive delete from root"),
Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE in production"),
# Require approval for emails
Policy.require_approval("send_email", timeout=300),
],
boundaries=[
# Only allow file access to these paths
PathBoundary(allowed_paths=["/tmp", "/home/user/workspace"]),
# Only allow these API hosts
NetworkBoundary(allowed_hosts=["api.openai.com", "api.anthropic.com"]),
],
)
# Use the decorator to guard functions
@safety.guard
def execute_command(cmd: str) -> str:
import subprocess
return subprocess.run(cmd, shell=True, capture_output=True).stdout.decode()
# This works fine
execute_command("ls -la /tmp")
# This raises SafetyViolation
execute_command("rm -rf /") # Blocked by policy!
Features
Policy-Based Blocking
Define rules for what operations should be blocked, warned, or audited:
from agent_safety_layer import Policy, PolicyAction
# Block by pattern
Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE")
# Warn on pattern (logs but doesn't block)
Policy.warn_pattern(r"sudo", "Warning: using sudo")
# Audit pattern (just records)
Policy.audit_pattern(r"SELECT.*FROM", "Auditing DB queries")
# Custom policy logic
def check_cost(operation: str, context: dict):
if context.get("estimated_cost", 0) > 100:
return PolicyResult(
action=PolicyAction.BLOCK,
policy_name="cost_limit",
reason="Operation exceeds cost limit",
)
return None
Policy.custom("cost_check", check_cost)
Runtime Boundaries
Restrict what resources your agent can access:
from agent_safety_layer import (
PathBoundary,
NetworkBoundary,
TimeBoundary,
ResourceBoundary,
)
# File system sandboxing
PathBoundary(
allowed_paths=["/tmp", "/home/user/workspace"],
blocked_paths=["/etc", "/var"],
block_patterns=["*.exe", "*.dll"],
)
# Network access control
NetworkBoundary(
allowed_hosts=["api.openai.com", "*.anthropic.com"],
blocked_hosts=["localhost", "127.0.0.1"],
blocked_ports=[22, 23, 3389], # SSH, Telnet, RDP
allow_private_ips=False,
)
# Execution time limits
TimeBoundary(
max_execution_time=60.0, # Per operation
max_total_time=3600.0, # Total session time
)
# Resource limits
ResourceBoundary(
max_memory_mb=1024,
max_cpu_percent=80,
max_operations=1000,
)
Execution Tracing
Record everything for debugging and auditing:
from agent_safety_layer import SafetyLayer, TraceExporter
safety = SafetyLayer(trace=True)
with safety.session(name="my_session") as session:
session.execute("read_file", lambda: read_file("/tmp/data.txt"))
session.execute("process_data", lambda: process(data))
session.log("Processing complete")
# Export trace
trace = session.finish()
print(TraceExporter.to_summary(trace))
TraceExporter.to_file(trace, "trace.json")
Output:
Trace: my_session (abc-123)
Started: 2024-01-15T10:30:00
Ended: 2024-01-15T10:30:05
Duration: 5000.00ms
Entries: 3
Errors: 0
Blocked: 0
Operations:
✓ read_file (50.0ms)
✓ process_data (4900.0ms)
✓ Processing complete (0.0ms)
Session Replay
Record sessions and replay with different policies:
from agent_safety_layer import SafetyLayer, SessionRecorder, SessionReplayer, Policy
# Record a session
safety = SafetyLayer()
with safety.session(record=True) as session:
session.execute("op1", lambda: do_thing_1())
session.execute("op2", lambda: do_thing_2())
session.execute("rm -rf /tmp/test", lambda: cleanup())
recording = session.get_recording()
recording.save("session.json")
# Replay with stricter policies
replayer = SessionReplayer(policies=[
Policy.block_pattern(r"rm\s+-rf", "No rm -rf allowed"),
])
result = replayer.replay(recording)
print(f"Blocked: {result.blocked_operations}/{result.total_operations}")
print(f"Would block: {result.blocked_details}")
# Compare policy sets
results = replayer.compare_policies(recording, {
"permissive": [],
"moderate": [Policy.warn_pattern(r"rm", "Warning on rm")],
"strict": [Policy.block_pattern(r"rm", "Block all rm")],
})
for name, result in results.items():
print(f"{name}: {result.block_rate}% blocked")
Human-in-the-Loop Approval
Gate sensitive operations on human approval:
from agent_safety_layer import SafetyLayer, ApprovalGate, Policy
import threading
# Set up approval gate
gate = ApprovalGate(
default_timeout=300, # 5 minutes
on_request=lambda r: print(f"Approval needed: {r.operation}"),
)
safety = SafetyLayer(
policies=[
Policy.require_approval("send_email", timeout=60),
Policy.require_approval("delete_user", timeout=300),
],
approval_gate=gate,
)
# In another thread/process, handle approvals
def approval_handler():
while True:
for request in gate.get_pending():
print(f"Approve {request.operation}? (y/n)")
if input() == "y":
gate.approve(request.id, responder="admin")
else:
gate.deny(request.id, responder="admin", message="Not allowed")
# Agent code
with safety.session() as session:
# This will block until approved or timeout
session.execute(
"send_email",
lambda: send_email(to="user@example.com", body="Hello!"),
context={"operation_type": "send_email"},
)
Convenience Function
For common setups:
from agent_safety_layer import create_safety_layer
safety = create_safety_layer(
block_dangerous_commands=True, # rm -rf, DROP TABLE, etc.
block_production_access=True, # Block *prod* database access
allowed_paths=["/tmp", "/home/user"],
allowed_hosts=["api.openai.com"],
enable_tracing=True,
enable_approval=False,
)
API Reference
SafetyLayer
Main class that ties everything together.
SafetyLayer(
policies: List[Policy] = None,
boundaries: List[Boundary] = None,
approval_gate: ApprovalGate = None,
trace: bool = True,
raise_on_violation: bool = True,
on_violation: Callable = None,
)
Methods:
check(operation, context)— Check if operation is allowedguard— Decorator to guard functionssession(name, record)— Context manager for traced sessionsadd_policy(policy)/remove_policy(name)add_boundary(boundary)/remove_boundary(name)
Policy
Factory methods for creating policies:
Policy.block_pattern(pattern, reason)— Block matching operationsPolicy.warn_pattern(pattern, reason)— Warn on matchesPolicy.audit_pattern(pattern, reason)— Just record matchesPolicy.require_approval(op_type, timeout)— Require human approvalPolicy.custom(name, check_fn)— Custom logic
Boundaries
PathBoundary(allowed_paths, blocked_paths, allow_patterns, block_patterns)NetworkBoundary(allowed_hosts, blocked_hosts, allowed_ports, blocked_ports, allow_private_ips)TimeBoundary(max_execution_time, max_total_time)ResourceBoundary(max_memory_mb, max_cpu_percent, max_open_files, max_operations)
Tracing
Tracer— Records operationsTrace— Container for trace entriesTraceExporter— Export to JSON, files, summaries
Replay
SessionRecorder— Records operations for replaySessionReplayer— Replays with different policiesReplayResult— Analysis of what would be blocked
Approval
ApprovalGate— Manages approval requestsApprovalRequest— A pending approvalInMemoryApprovalQueue— Simple queue implementation
Framework Integrations
Coming soon: LangChain, OpenAI, Anthropic integrations.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_safety_layer-0.1.0.tar.gz.
File metadata
- Download URL: agent_safety_layer-0.1.0.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf72951bf0b44bef9d12eefd9f63c704dc63c52c255b92fafa85bf56efcdee70
|
|
| MD5 |
235beea41ed64eb5826072cf8b861fd9
|
|
| BLAKE2b-256 |
ebc54d3981b0238421dc96f32324fabd5dd47629d251caa871fe6c042aae87fc
|
File details
Details for the file agent_safety_layer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_safety_layer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15edec7795fc1e1774b1209df86784692d765f9d7018a81eef0e84ae1dcd4f7b
|
|
| MD5 |
3cd9b941c997f7e66dabc90dd9c1aab6
|
|
| BLAKE2b-256 |
0171e9626a53bddf31b610d78e92466bfb1a64e3d901556ed80776afd4e563d2
|