Production-grade safety boundaries for AI agents - policies, tracing, replay, and human-in-the-loop approval

These details have not been verified by PyPI

Project links

Project description

agent-safety-layer

Production-grade safety boundaries for AI agents — policies, runtime limits, execution tracing, replay, and human-in-the-loop approval.

Why?

AI agents can do dangerous things — delete files, drop databases, send emails, make API calls. This library provides guardrails:

Block dangerous operations before they execute
Sandbox file and network access to allowed paths/hosts
Trace everything for debugging and auditing
Replay sessions to test policy changes safely
Human approval gates for sensitive operations

Installation

pip install agent-safety-layer

Quick Start

from agent_safety_layer import SafetyLayer, Policy, PathBoundary, NetworkBoundary

# Create a safety layer with policies and boundaries
safety = SafetyLayer(
    policies=[
        # Block dangerous shell commands
        Policy.block_pattern(r"rm\s+-rf\s+/", "No recursive delete from root"),
        Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE in production"),
        
        # Require approval for emails
        Policy.require_approval("send_email", timeout=300),
    ],
    boundaries=[
        # Only allow file access to these paths
        PathBoundary(allowed_paths=["/tmp", "/home/user/workspace"]),
        
        # Only allow these API hosts
        NetworkBoundary(allowed_hosts=["api.openai.com", "api.anthropic.com"]),
    ],
)

# Use the decorator to guard functions
@safety.guard
def execute_command(cmd: str) -> str:
    import subprocess
    return subprocess.run(cmd, shell=True, capture_output=True).stdout.decode()

# This works fine
execute_command("ls -la /tmp")

# This raises SafetyViolation
execute_command("rm -rf /")  # Blocked by policy!

Features

Policy-Based Blocking

Define rules for what operations should be blocked, warned, or audited:

from agent_safety_layer import Policy, PolicyAction

# Block by pattern
Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE")

# Warn on pattern (logs but doesn't block)
Policy.warn_pattern(r"sudo", "Warning: using sudo")

# Audit pattern (just records)
Policy.audit_pattern(r"SELECT.*FROM", "Auditing DB queries")

# Custom policy logic
def check_cost(operation: str, context: dict):
    if context.get("estimated_cost", 0) > 100:
        return PolicyResult(
            action=PolicyAction.BLOCK,
            policy_name="cost_limit",
            reason="Operation exceeds cost limit",
        )
    return None

Policy.custom("cost_check", check_cost)

Runtime Boundaries

Restrict what resources your agent can access:

from agent_safety_layer import (
    PathBoundary,
    NetworkBoundary, 
    TimeBoundary,
    ResourceBoundary,
)

# File system sandboxing
PathBoundary(
    allowed_paths=["/tmp", "/home/user/workspace"],
    blocked_paths=["/etc", "/var"],
    block_patterns=["*.exe", "*.dll"],
)

# Network access control
NetworkBoundary(
    allowed_hosts=["api.openai.com", "*.anthropic.com"],
    blocked_hosts=["localhost", "127.0.0.1"],
    blocked_ports=[22, 23, 3389],  # SSH, Telnet, RDP
    allow_private_ips=False,
)

# Execution time limits
TimeBoundary(
    max_execution_time=60.0,  # Per operation
    max_total_time=3600.0,    # Total session time
)

# Resource limits
ResourceBoundary(
    max_memory_mb=1024,
    max_cpu_percent=80,
    max_operations=1000,
)

Execution Tracing

Record everything for debugging and auditing:

from agent_safety_layer import SafetyLayer, TraceExporter

safety = SafetyLayer(trace=True)

with safety.session(name="my_session") as session:
    session.execute("read_file", lambda: read_file("/tmp/data.txt"))
    session.execute("process_data", lambda: process(data))
    session.log("Processing complete")

# Export trace
trace = session.finish()
print(TraceExporter.to_summary(trace))
TraceExporter.to_file(trace, "trace.json")

Output:

Trace: my_session (abc-123)
Started: 2024-01-15T10:30:00
Ended: 2024-01-15T10:30:05
Duration: 5000.00ms
Entries: 3
Errors: 0
Blocked: 0

Operations:
  ✓ read_file (50.0ms)
  ✓ process_data (4900.0ms)
  ✓ Processing complete (0.0ms)

Session Replay

Record sessions and replay with different policies:

from agent_safety_layer import SafetyLayer, SessionRecorder, SessionReplayer, Policy

# Record a session
safety = SafetyLayer()
with safety.session(record=True) as session:
    session.execute("op1", lambda: do_thing_1())
    session.execute("op2", lambda: do_thing_2())
    session.execute("rm -rf /tmp/test", lambda: cleanup())

recording = session.get_recording()
recording.save("session.json")

# Replay with stricter policies
replayer = SessionReplayer(policies=[
    Policy.block_pattern(r"rm\s+-rf", "No rm -rf allowed"),
])

result = replayer.replay(recording)
print(f"Blocked: {result.blocked_operations}/{result.total_operations}")
print(f"Would block: {result.blocked_details}")

# Compare policy sets
results = replayer.compare_policies(recording, {
    "permissive": [],
    "moderate": [Policy.warn_pattern(r"rm", "Warning on rm")],
    "strict": [Policy.block_pattern(r"rm", "Block all rm")],
})

for name, result in results.items():
    print(f"{name}: {result.block_rate}% blocked")

Human-in-the-Loop Approval

Gate sensitive operations on human approval:

from agent_safety_layer import SafetyLayer, ApprovalGate, Policy
import threading

# Set up approval gate
gate = ApprovalGate(
    default_timeout=300,  # 5 minutes
    on_request=lambda r: print(f"Approval needed: {r.operation}"),
)

safety = SafetyLayer(
    policies=[
        Policy.require_approval("send_email", timeout=60),
        Policy.require_approval("delete_user", timeout=300),
    ],
    approval_gate=gate,
)

# In another thread/process, handle approvals
def approval_handler():
    while True:
        for request in gate.get_pending():
            print(f"Approve {request.operation}? (y/n)")
            if input() == "y":
                gate.approve(request.id, responder="admin")
            else:
                gate.deny(request.id, responder="admin", message="Not allowed")

# Agent code
with safety.session() as session:
    # This will block until approved or timeout
    session.execute(
        "send_email",
        lambda: send_email(to="user@example.com", body="Hello!"),
        context={"operation_type": "send_email"},
    )

Convenience Function

For common setups:

from agent_safety_layer import create_safety_layer

safety = create_safety_layer(
    block_dangerous_commands=True,   # rm -rf, DROP TABLE, etc.
    block_production_access=True,    # Block *prod* database access
    allowed_paths=["/tmp", "/home/user"],
    allowed_hosts=["api.openai.com"],
    enable_tracing=True,
    enable_approval=False,
)

API Reference

SafetyLayer

Main class that ties everything together.

SafetyLayer(
    policies: List[Policy] = None,
    boundaries: List[Boundary] = None,
    approval_gate: ApprovalGate = None,
    trace: bool = True,
    raise_on_violation: bool = True,
    on_violation: Callable = None,
)

Methods:

check(operation, context) — Check if operation is allowed
guard — Decorator to guard functions
session(name, record) — Context manager for traced sessions
add_policy(policy) / remove_policy(name)
add_boundary(boundary) / remove_boundary(name)

Policy

Factory methods for creating policies:

Policy.block_pattern(pattern, reason) — Block matching operations
Policy.warn_pattern(pattern, reason) — Warn on matches
Policy.audit_pattern(pattern, reason) — Just record matches
Policy.require_approval(op_type, timeout) — Require human approval
Policy.custom(name, check_fn) — Custom logic

Boundaries

PathBoundary(allowed_paths, blocked_paths, allow_patterns, block_patterns)
NetworkBoundary(allowed_hosts, blocked_hosts, allowed_ports, blocked_ports, allow_private_ips)
TimeBoundary(max_execution_time, max_total_time)
ResourceBoundary(max_memory_mb, max_cpu_percent, max_open_files, max_operations)

Tracing

Tracer — Records operations
Trace — Container for trace entries
TraceExporter — Export to JSON, files, summaries

Replay

SessionRecorder — Records operations for replay
SessionReplayer — Replays with different policies
ReplayResult — Analysis of what would be blocked

Approval

ApprovalGate — Manages approval requests
ApprovalRequest — A pending approval
InMemoryApprovalQueue — Simple queue implementation

Framework Integrations

Coming soon: LangChain, OpenAI, Anthropic integrations.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_safety_layer-0.1.0.tar.gz (33.6 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_safety_layer-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file agent_safety_layer-0.1.0.tar.gz.

File metadata

Download URL: agent_safety_layer-0.1.0.tar.gz
Upload date: Feb 5, 2026
Size: 33.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agent_safety_layer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cf72951bf0b44bef9d12eefd9f63c704dc63c52c255b92fafa85bf56efcdee70`
MD5	`235beea41ed64eb5826072cf8b861fd9`
BLAKE2b-256	`ebc54d3981b0238421dc96f32324fabd5dd47629d251caa871fe6c042aae87fc`

See more details on using hashes here.

File details

Details for the file agent_safety_layer-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_safety_layer-0.1.0-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agent_safety_layer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`15edec7795fc1e1774b1209df86784692d765f9d7018a81eef0e84ae1dcd4f7b`
MD5	`3cd9b941c997f7e66dabc90dd9c1aab6`
BLAKE2b-256	`0171e9626a53bddf31b610d78e92466bfb1a64e3d901556ed80776afd4e563d2`

See more details on using hashes here.

agent-safety-layer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-safety-layer

Why?

Installation

Quick Start

Features

Policy-Based Blocking

Runtime Boundaries

Execution Tracing

Session Replay

Human-in-the-Loop Approval

Convenience Function

API Reference

SafetyLayer

Policy

Boundaries

Tracing

Replay

Approval

Framework Integrations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes