Skip to main content

Production-grade safety boundaries for AI agents - policies, tracing, replay, and human-in-the-loop approval

Project description

agent-safety-layer

Production-grade safety boundaries for AI agents — policies, runtime limits, execution tracing, replay, and human-in-the-loop approval.

PyPI version Python 3.9+ License: MIT

Why?

AI agents can do dangerous things — delete files, drop databases, send emails, make API calls. This library provides guardrails:

  • Block dangerous operations before they execute
  • Sandbox file and network access to allowed paths/hosts
  • Trace everything for debugging and auditing
  • Replay sessions to test policy changes safely
  • Human approval gates for sensitive operations

Installation

pip install agent-safety-layer

Quick Start

from agent_safety_layer import SafetyLayer, Policy, PathBoundary, NetworkBoundary

# Create a safety layer with policies and boundaries
safety = SafetyLayer(
    policies=[
        # Block dangerous shell commands
        Policy.block_pattern(r"rm\s+-rf\s+/", "No recursive delete from root"),
        Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE in production"),
        
        # Require approval for emails
        Policy.require_approval("send_email", timeout=300),
    ],
    boundaries=[
        # Only allow file access to these paths
        PathBoundary(allowed_paths=["/tmp", "/home/user/workspace"]),
        
        # Only allow these API hosts
        NetworkBoundary(allowed_hosts=["api.openai.com", "api.anthropic.com"]),
    ],
)

# Use the decorator to guard functions
@safety.guard
def execute_command(cmd: str) -> str:
    import subprocess
    return subprocess.run(cmd, shell=True, capture_output=True).stdout.decode()

# This works fine
execute_command("ls -la /tmp")

# This raises SafetyViolation
execute_command("rm -rf /")  # Blocked by policy!

Features

Policy-Based Blocking

Define rules for what operations should be blocked, warned, or audited:

from agent_safety_layer import Policy, PolicyAction

# Block by pattern
Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE")

# Warn on pattern (logs but doesn't block)
Policy.warn_pattern(r"sudo", "Warning: using sudo")

# Audit pattern (just records)
Policy.audit_pattern(r"SELECT.*FROM", "Auditing DB queries")

# Custom policy logic
def check_cost(operation: str, context: dict):
    if context.get("estimated_cost", 0) > 100:
        return PolicyResult(
            action=PolicyAction.BLOCK,
            policy_name="cost_limit",
            reason="Operation exceeds cost limit",
        )
    return None

Policy.custom("cost_check", check_cost)

Runtime Boundaries

Restrict what resources your agent can access:

from agent_safety_layer import (
    PathBoundary,
    NetworkBoundary, 
    TimeBoundary,
    ResourceBoundary,
)

# File system sandboxing
PathBoundary(
    allowed_paths=["/tmp", "/home/user/workspace"],
    blocked_paths=["/etc", "/var"],
    block_patterns=["*.exe", "*.dll"],
)

# Network access control
NetworkBoundary(
    allowed_hosts=["api.openai.com", "*.anthropic.com"],
    blocked_hosts=["localhost", "127.0.0.1"],
    blocked_ports=[22, 23, 3389],  # SSH, Telnet, RDP
    allow_private_ips=False,
)

# Execution time limits
TimeBoundary(
    max_execution_time=60.0,  # Per operation
    max_total_time=3600.0,    # Total session time
)

# Resource limits
ResourceBoundary(
    max_memory_mb=1024,
    max_cpu_percent=80,
    max_operations=1000,
)

Execution Tracing

Record everything for debugging and auditing:

from agent_safety_layer import SafetyLayer, TraceExporter

safety = SafetyLayer(trace=True)

with safety.session(name="my_session") as session:
    session.execute("read_file", lambda: read_file("/tmp/data.txt"))
    session.execute("process_data", lambda: process(data))
    session.log("Processing complete")

# Export trace
trace = session.finish()
print(TraceExporter.to_summary(trace))
TraceExporter.to_file(trace, "trace.json")

Output:

Trace: my_session (abc-123)
Started: 2024-01-15T10:30:00
Ended: 2024-01-15T10:30:05
Duration: 5000.00ms
Entries: 3
Errors: 0
Blocked: 0

Operations:
  ✓ read_file (50.0ms)
  ✓ process_data (4900.0ms)
  ✓ Processing complete (0.0ms)

Session Replay

Record sessions and replay with different policies:

from agent_safety_layer import SafetyLayer, SessionRecorder, SessionReplayer, Policy

# Record a session
safety = SafetyLayer()
with safety.session(record=True) as session:
    session.execute("op1", lambda: do_thing_1())
    session.execute("op2", lambda: do_thing_2())
    session.execute("rm -rf /tmp/test", lambda: cleanup())

recording = session.get_recording()
recording.save("session.json")

# Replay with stricter policies
replayer = SessionReplayer(policies=[
    Policy.block_pattern(r"rm\s+-rf", "No rm -rf allowed"),
])

result = replayer.replay(recording)
print(f"Blocked: {result.blocked_operations}/{result.total_operations}")
print(f"Would block: {result.blocked_details}")

# Compare policy sets
results = replayer.compare_policies(recording, {
    "permissive": [],
    "moderate": [Policy.warn_pattern(r"rm", "Warning on rm")],
    "strict": [Policy.block_pattern(r"rm", "Block all rm")],
})

for name, result in results.items():
    print(f"{name}: {result.block_rate}% blocked")

Human-in-the-Loop Approval

Gate sensitive operations on human approval:

from agent_safety_layer import SafetyLayer, ApprovalGate, Policy
import threading

# Set up approval gate
gate = ApprovalGate(
    default_timeout=300,  # 5 minutes
    on_request=lambda r: print(f"Approval needed: {r.operation}"),
)

safety = SafetyLayer(
    policies=[
        Policy.require_approval("send_email", timeout=60),
        Policy.require_approval("delete_user", timeout=300),
    ],
    approval_gate=gate,
)

# In another thread/process, handle approvals
def approval_handler():
    while True:
        for request in gate.get_pending():
            print(f"Approve {request.operation}? (y/n)")
            if input() == "y":
                gate.approve(request.id, responder="admin")
            else:
                gate.deny(request.id, responder="admin", message="Not allowed")

# Agent code
with safety.session() as session:
    # This will block until approved or timeout
    session.execute(
        "send_email",
        lambda: send_email(to="user@example.com", body="Hello!"),
        context={"operation_type": "send_email"},
    )

Convenience Function

For common setups:

from agent_safety_layer import create_safety_layer

safety = create_safety_layer(
    block_dangerous_commands=True,   # rm -rf, DROP TABLE, etc.
    block_production_access=True,    # Block *prod* database access
    allowed_paths=["/tmp", "/home/user"],
    allowed_hosts=["api.openai.com"],
    enable_tracing=True,
    enable_approval=False,
)

API Reference

SafetyLayer

Main class that ties everything together.

SafetyLayer(
    policies: List[Policy] = None,
    boundaries: List[Boundary] = None,
    approval_gate: ApprovalGate = None,
    trace: bool = True,
    raise_on_violation: bool = True,
    on_violation: Callable = None,
)

Methods:

  • check(operation, context) — Check if operation is allowed
  • guard — Decorator to guard functions
  • session(name, record) — Context manager for traced sessions
  • add_policy(policy) / remove_policy(name)
  • add_boundary(boundary) / remove_boundary(name)

Policy

Factory methods for creating policies:

  • Policy.block_pattern(pattern, reason) — Block matching operations
  • Policy.warn_pattern(pattern, reason) — Warn on matches
  • Policy.audit_pattern(pattern, reason) — Just record matches
  • Policy.require_approval(op_type, timeout) — Require human approval
  • Policy.custom(name, check_fn) — Custom logic

Boundaries

  • PathBoundary(allowed_paths, blocked_paths, allow_patterns, block_patterns)
  • NetworkBoundary(allowed_hosts, blocked_hosts, allowed_ports, blocked_ports, allow_private_ips)
  • TimeBoundary(max_execution_time, max_total_time)
  • ResourceBoundary(max_memory_mb, max_cpu_percent, max_open_files, max_operations)

Tracing

  • Tracer — Records operations
  • Trace — Container for trace entries
  • TraceExporter — Export to JSON, files, summaries

Replay

  • SessionRecorder — Records operations for replay
  • SessionReplayer — Replays with different policies
  • ReplayResult — Analysis of what would be blocked

Approval

  • ApprovalGate — Manages approval requests
  • ApprovalRequest — A pending approval
  • InMemoryApprovalQueue — Simple queue implementation

Framework Integrations

Coming soon: LangChain, OpenAI, Anthropic integrations.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_safety_layer-0.1.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_safety_layer-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_safety_layer-0.1.0.tar.gz.

File metadata

  • Download URL: agent_safety_layer-0.1.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agent_safety_layer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf72951bf0b44bef9d12eefd9f63c704dc63c52c255b92fafa85bf56efcdee70
MD5 235beea41ed64eb5826072cf8b861fd9
BLAKE2b-256 ebc54d3981b0238421dc96f32324fabd5dd47629d251caa871fe6c042aae87fc

See more details on using hashes here.

File details

Details for the file agent_safety_layer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_safety_layer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15edec7795fc1e1774b1209df86784692d765f9d7018a81eef0e84ae1dcd4f7b
MD5 3cd9b941c997f7e66dabc90dd9c1aab6
BLAKE2b-256 0171e9626a53bddf31b610d78e92466bfb1a64e3d901556ed80776afd4e563d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page