Skip to main content

Deprecated — use agent-governance-toolkit-cli instead. Agent Sandbox: Docker-based execution isolation for AI agents

Project description

Agent Sandbox

Public Preview — execution isolation for AI agents with policy-driven resource limits, tool proxies, network enforcement, and filesystem checkpointing. Ships three interchangeable backends behind the same SandboxProvider ABC.

Part of the Agent Governance Toolkit.

Providers at a glance

Provider Isolation primitive Best for Extra
DockerSandboxProvider Hardened OCI container (runc, auto-upgrades to gVisor / Kata) Local dev, CI, self-hosted runners agt-sandbox[docker]
HyperLightSandboxProvider KVM / mshv / WHP micro-VM via hyperlight-sandbox Sub-millisecond cold start, per-call VM isolation agt-sandbox[hyperlight]
ACASandboxProvider Azure Container Apps sandbox (managed) Production, multi-tenant, no infra to run agt-sandbox[azure] + the early-access SDK wheel

All three implement the same async + sync API (create_session, execute_code, destroy_session, plus *_async variants) and consume the same PolicyDocument for resource caps, network allowlists, and tool allowlists.

Installation

# Everything (Docker + Hyperlight + policy engine):
pip install "agt-sandbox[full]"

# Pick what you need:
pip install "agt-sandbox[docker]"
pip install "agt-sandbox[hyperlight]"
pip install "agt-sandbox[azure,policy]"

The Azure data-plane SDK ships as an early-access wheel — pin the URL:

pip install https://github.com/microsoft/azure-container-apps/releases/download/python-sdk-v0.1.0b1-early-access/azure_containerapps_sandbox-0.1.0b1-py3-none-any.whl

Quick start (all three providers)

from agent_sandbox import (
    DockerSandboxProvider,
    HyperLightSandboxProvider,
    ACASandboxProvider,
)

# Pick one:
provider = DockerSandboxProvider()
# provider = HyperLightSandboxProvider(backend="wasm")
# provider = ACASandboxProvider(
#     resource_group="my-rg", sandbox_group="agents",
#     region="eastus2", disk="python-3.13",
#     ensure_group_location="eastus2",
# )

handle = provider.create_session("agent-1")
out = provider.execute_code("agent-1", handle.session_id, "print('hello')")
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)

1. DockerSandboxProvider — local hardened containers

Each agent session runs in its own container with capabilities dropped, no privilege escalation, a read-only root filesystem, a non-root user, and no network by default.

import asyncio
from agent_sandbox import (
    DockerSandboxProvider,
    IsolationRuntime,
    SandboxConfig,
)

async def run_agent_task():
    provider = DockerSandboxProvider(
        image="python:3.12-slim",
        runtime=IsolationRuntime.AUTO,   # auto-upgrade to gVisor / Kata
    )
    config = SandboxConfig(
        timeout_seconds=30,
        memory_mb=256,
        cpu_limit=0.5,
        network_enabled=False,
        read_only_fs=True,
    )

    session = await provider.create_session_async("research-agent", config=config)
    try:
        execution = await provider.execute_code_async(
            "research-agent", session.session_id,
            "import json, math; print(json.dumps([math.sqrt(x) for x in range(5)]))",
        )
        print(execution.result.stdout)

        checkpoint = provider.save_state(
            "research-agent", session.session_id, "after-step-1",
        )
        print(f"Checkpoint saved: {checkpoint.image_tag}")
    finally:
        await provider.destroy_session_async("research-agent", session.session_id)

asyncio.run(run_agent_task())

What the Docker sandbox enforces

Control Default
Linux capabilities All dropped (--cap-drop=ALL)
Privilege escalation Blocked (--security-opt=no-new-privileges)
Root filesystem Read-only
Container user nobody (UID 65534)
PID limit 256
Network Disabled unless explicitly allowed
Runtime runc (auto-upgrades to gVisor or Kata when available)
State save_state / restore_state via image commit

2. HyperLightSandboxProvider — micro-VM isolation

Backed by the upstream hyperlight-sandbox runtime. Each session is a fresh micro-VM on KVM (Linux), mshv (Azure HCL), or WHP (Windows) — typical cold start is well under a millisecond. Tools are registered as host functions and invoked synchronously from the guest, gated by the session's policy.tool_allowlist.

from agent_sandbox import HyperLightSandboxProvider

def fetch_arxiv(query: str) -> str:
    return f"<results for {query}>"

provider = HyperLightSandboxProvider(
    backend="wasm",                 # or "hyperlightjs" / "nanvix"
    module="python_guest",          # only meaningful for backend="wasm"
    tools={"fetch_arxiv": fetch_arxiv},
)

if not provider.is_available():
    raise SystemExit(f"Hyperlight unavailable: {provider.unavailable_reason}")

handle = provider.create_session("agent-1")
out = provider.execute_code(
    "agent-1", handle.session_id,
    "print(fetch_arxiv('cs.CL'))",
)
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)

Notes:

  • Each session owns one OS thread that is the sole code path touching its Sandbox — required by the upstream runtime.
  • provider.is_available() probes for a hypervisor and returns unavailable_reason if none is present (e.g. on macOS hosts without WHP / KVM passthrough).
  • Only tools listed in a session's policy.tool_allowlist are exposed to that session's guest; the rest stay host-side.

3. ACASandboxProvider — Azure Container Apps

Runs each session inside a managed Azure Container Apps sandbox via the early-access azure-containerapps-sandbox Python SDK (complete reference). Same API as the other providers; the rest of your code is unchanged.

pip install "agt-sandbox[azure,policy]"
pip install https://github.com/microsoft/azure-container-apps/releases/download/python-sdk-v0.1.0b1-early-access/azure_containerapps_sandbox-0.1.0b1-py3-none-any.whl

az login   # or use managed identity in hosted compute
from agent_sandbox import ACASandboxProvider

provider = ACASandboxProvider(
    resource_group="my-rg",          # must already exist
    sandbox_group="agents",          # auto-created if ensure_group_location is set
    region="eastus2",                # selects the data-plane endpoint
    subscription_id=None,            # falls back to AZURE_SUBSCRIPTION_ID env var
    disk="python-3.13",              # public disk image with python3 preinstalled
    ensure_group_location="eastus2", # create the sandbox group on first use
)

if not provider.is_available():
    raise SystemExit(f"ACA unavailable: {provider.unavailable_reason}")

handle = provider.create_session("agent-1")
out = provider.execute_code(
    "agent-1", handle.session_id, "print('hello azure')"
)
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)
provider.close()

The provider holds one SandboxGroupClient per (resource_group, sandbox_group) pair and caches the per-sandbox SandboxClient returned by begin_create_sandbox().result(). When a PolicyDocument is supplied, network_allowlist is translated into a fail-closed egress policy (defaultAction: Deny + per-host Allow rules) and applied via SandboxClient.set_egress_policy. Set defaults.network_default: allow in the policy if you explicitly want the SDK's default-allow behaviour.

A complete worked example (8 verified branches against live Azure — allow / policy-deny / egress-block / sanity / tool-allowed / tool-denied / remote-execution proof / egress audit) lives at examples/quickstart/aca_sandbox_test.py and reads its policy from examples/quickstart/policies/aca_research_agent.yaml.


Policy-driven configuration

All three providers consume the same agent_os.policies.PolicyDocument. Sandbox resource caps, network allowlists, and tool allowlists are native fields on the schema as of AGT 3.3, so policies live in YAML:

name: research-agent
version: "2"

defaults:
  action: allow
  max_cpu: 1.0
  max_memory_mb: 2048
  timeout_seconds: 90
  network_default: deny

network_allowlist:
  - api.openai.com
  - "*.github.com"

tool_allowlist:
  - fetch_arxiv

rules:
  - name: deny-shell-out
    condition: { field: code, operator: contains, value: subprocess }
    action: deny
    priority: 100
    message: "shell-out blocked by research-agent policy"
from agent_os.policies import PolicyDocument

policy = PolicyDocument.from_yaml("policies/aca_research_agent.yaml")
handle = await provider.create_session_async("agent-1", policy=policy)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agt_sandbox-4.0.0.tar.gz (86.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agt_sandbox-4.0.0-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file agt_sandbox-4.0.0.tar.gz.

File metadata

  • Download URL: agt_sandbox-4.0.0.tar.gz
  • Upload date:
  • Size: 86.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for agt_sandbox-4.0.0.tar.gz
Algorithm Hash digest
SHA256 9b995eed8f5f7c0c7a1ad262a9c16685868b95a4db5dc2f53f9f6cb7e84adc45
MD5 c3da1a305fcd1b4907f2cd348f224d8d
BLAKE2b-256 39853afb4553921e07f16c5c7bf395cb3227ddeedeb87ab2a1a99316c1409f01

See more details on using hashes here.

File details

Details for the file agt_sandbox-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: agt_sandbox-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for agt_sandbox-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7df520db1ff88d2e96532d6d55c2e660e9d82e2425181e129e50e3a95c475401
MD5 b10ff6993ff33c846560c52eb957102f
BLAKE2b-256 0a1f05cc9ea55d5dde2767793788a971a36e7682855b4651a801bc4b8127c235

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page