Skip to main content

A runtime and definition-time security guardrail framework for AI agents and developers.

Project description

Agent-Safeguard

A lightweight, enterprise-grade architectural guardrail and sandbox library for Python applications developed in collaboration with AI agents.


The Problem

AI coding assistants (such as Antigravity, Cursor, Copilot, and Codex) excel at generating localized code but operate within limited context windows. Consequently, they lack a comprehensive understanding of global architectural boundaries. Autonomous modifications frequently bypass type contracts, introduce illegal imports, violate filesystem/network access policies, or create infinite resource lockups.

The Solution

Agent-Safeguard establishes programmatic guardrails that enforce structural and runtime invariants. By combining definition-time AST checks, dynamic socket/file monkeypatching, RAM/CPU constraints, database access filters, and prompt injection guards, it captures boundary violations immediately.

Crucially, instead of raising generic stack traces, Agent-Safeguard generates structured JSON reports designed to be ingested by LLM agents, enabling closed-loop, automated self-correction.


Installation

Install using pip:

pip install agent-safeguard

Import it in your Python code using agent_shield:

from agent_shield import virtual_fs, guard_prompt

Core Features & Decorators

1. Advanced Sandboxes & Guardrails

  • @virtual_fs(in_memory_write=True, allow_real_read=None) Redirects filesystem writes to an in-memory virtual storage in RAM. Filesystem reads check the virtual state first, falling back to the real disk if the path is whitelisted in allow_real_read (defaults to permitting all reads but redirecting all writes to RAM, perfect for safe dry-runs).

  • @guard_prompt(scan_input=True, scan_output=False, custom_rules=None) Scans function inputs and outputs for prompt injection and developer mode override signatures (e.g. "ignore previous instructions", "system override", "bypass safety"). Raises PromptInjectionViolationError on match.

  • @restrict_db(read_only=True) Intercepts sqlite3 database connections and restricts execution to read-only queries. If a query contains write/alter keywords (e.g. INSERT, UPDATE, DELETE, DROP, ALTER, CREATE), blocks it and raises DatabaseViolationError.

  • @restrict_env(allow_mutation=False) Prevents modifying or deleting system environment variables (os.environ) during function execution, raising EnvironmentViolationError.

2. Architectural Integrity & AST Checks

  • @shield(allowed_imports=..., forbidden_imports=..., allow_unsafe=False, allow_globals=False, max_complexity=None) Enforces function boundary constraints at definition time via static AST analysis:

    • Allowed Imports: Whitelist only specific modules for import inside the function.
    • Forbidden Imports: Blacklist specific modules (e.g. blocking os or sys).
    • Unsafe Execution: Blocks calls to eval() and exec().
    • Globals Usage: Blocks the global keyword to prevent global state pollution.
    • Hardcoded Secrets: Scans constants for API keys (e.g., AWS, OpenAI) or variables named api_key/secret.
    • CPU Lockups: Detects infinite loops with empty bodies (while True: pass).
    • Complexity limit: Restricts the maximum allowed cyclomatic complexity of the function's AST.
    • Runtime Types: Automatically validates function return values against declared type hints (supports generics and union types).
  • @freeze Locks the function source code. Registers a cryptographic SHA-256 hash of the function implementation inside shield_reports/frozen_functions.json. Any unauthorized modifications to the code body will raise a ShieldViolationError on startup.

  • @lock_signature Locks the function's signature. Saves parameter names, ordering, defaults, and type hints in shield_reports/locked_signatures.json to prevent AI from altering the function interface.

3. Resource & Security Sandboxing

  • @timeout(seconds: float) Enforces a strict runtime execution time limit. Bypasses signal limits gracefully when executed in background threads. Raises TimeoutViolationError if exceeded.

  • @limit_memory(max_mb: float) Monitors RSS memory growth of the process during execution. If the memory delta exceeds the specified limit, injects MemoryViolationError into the main thread.

  • @restrict_network(allowed_hosts: list[str]) Restricts socket-level connections. Monkeypatches socket.connect dynamically and thread-safely. Supports wildcards (e.g. *.stripe.com) and resolves domain IPs automatically.

  • @restrict_fs(allow_read: list[str] = None, allow_write: list[str] = None) Monkeypatches builtins.open and standard file manipulation operations. Prevents path traversal bypasses and whitelists Python interpreter/import folders so package loading remains unimpeded.

  • @no_side_effects(allow_args_mutation=False, allow_globals=False, allow_stdout=False) Enforces function purity. Verifies that the function does not mutate its arguments, modify module-level globals, or print output to the console. Raises SideEffectViolationError on violation.

4. AI Directives & Semantic Assertions

  • @prompt_inject(instruction: str) Prepend a standardized, high-visibility block containing architectural instructions directly to the function's docstring:

    === AI ASSISTANT ARCHITECTURAL CONSTRAINT ===
    {instruction}
    =============================================
    
  • @prompt_assert(prompt: str) Sends the function source code to the Gemini API (gemini-1.5-flash) at definition time to semantically evaluate whether the implementation satisfies the natural language prompt constraint. Supports registry mocking for offline unit testing.


5. Centrally Configured Guardrails (shield.yaml)

To prevent AI agents from editing or deleting decorators from Python files, you can define your project rules centrally inside a shield.yaml file on the project root:

rules:
  - pattern: "my_app.payments.*"
    timeout: 5.0
    restrict_network: ["api.stripe.com"]
    virtual_fs: true
    guard_prompt: true
    restrict_db: true
  - pattern: "my_app.utils.*"
    allowed_imports: ["math", "json"]

Agent-Safeguard hooks into Python's import system (builtins.__import__) and automatically decorates all matching module functions at import time.

6. Audit Mode (Passive Mode)

Set the environment variable AGENT_SHIELD_PASSIVE=true to enable passive auditing. Under passive mode, rules write structured JSON reports and output console warnings on violations, but do not raise exceptions (excluding interruptive constraints like timeout).


JSON Diagnostic Reports

When a constraint is violated, Agent-Safeguard writes a diagnostic report to shield_reports/violation_report.json:

{
  "violation_type": "network_violation",
  "function_name": "charge_customer",
  "file_path": "/Users/safik/PycharmProjects/agent-shield/my_app/payments.py",
  "details": {
    "attempted_host": "unauthorized-api.com",
    "allowed_hosts": ["api.stripe.com"]
  },
  "instruction": "AI Assistant Instruction: The function 'charge_customer' in file '/Users/safik/PycharmProjects/agent-shield/my_app/payments.py' attempted to establish an unauthorized network connection to 'unauthorized-api.com'. Connections are restricted to: api.stripe.com. Please remove this network call or connect to an allowed host."
}

AI agents can read this file in a self-correction loop to rewrite their code automatically.


Quick Start

Create a shield.yaml in your project root:

rules:
  - pattern: "sandbox_code.*"
    timeout: 0.1
    allow_read: ["/tmp"]

Define your functions, and Agent-Safeguard handles the rest:

# sandbox_code.py
def process_data():
    # Attempting to read unauthorized file will trigger FileSystemViolationError
    with open("/etc/passwd", "r") as f:
        return f.read()

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_safeguard-1.0.5.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_safeguard-1.0.5-py3-none-any.whl (47.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_safeguard-1.0.5.tar.gz.

File metadata

  • Download URL: agent_safeguard-1.0.5.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agent_safeguard-1.0.5.tar.gz
Algorithm Hash digest
SHA256 41bde4946055156f688f1a22204e41e62483f0c35d6fbc7a6aef85b2b76d238e
MD5 819b14f3654000a273d93b4607856015
BLAKE2b-256 9a97f5ef1e4e18f54d4243493b439a834be919f6b7af53d2e8bc04bd421e4626

See more details on using hashes here.

File details

Details for the file agent_safeguard-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_safeguard-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7ba26c5e466a047e9c8d1db482ecf10adaef65c7fbf9adcfcc7aaeba056b7768
MD5 1873102122fa606714f0ba02cccbb78d
BLAKE2b-256 53f9f0dcc3a2b6397818062dd844e77512b2296ea22fb828f9f87f5237c033b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page