Skip to main content

Agent Sandbox: sandbox execution isolation for AI agents

Project description

Agent Sandbox

Public Preview — execution isolation for AI agents with policy-driven resource limits, tool proxies, network enforcement, and filesystem checkpointing. Ships three interchangeable backends behind the same SandboxProvider ABC.

Part of the Agent Governance Toolkit.

Providers at a glance

Provider Isolation primitive Best for Extra
DockerSandboxProvider Hardened OCI container (runc, auto-upgrades to gVisor / Kata) Local dev, CI, self-hosted runners agt-sandbox[docker]
HyperLightSandboxProvider KVM / mshv / WHP micro-VM via hyperlight-sandbox Sub-millisecond cold start, per-call VM isolation agt-sandbox[hyperlight]
ACASandboxProvider Azure Container Apps sandbox (managed) Production, multi-tenant, no infra to run agt-sandbox[azure] + the early-access SDK wheel

All three implement the same async + sync API (create_session, execute_code, destroy_session, plus *_async variants) and consume the same PolicyDocument for resource caps, network allowlists, and tool allowlists.

Installation

# Everything (Docker + Hyperlight + policy engine):
pip install "agt-sandbox[full]"

# Pick what you need:
pip install "agt-sandbox[docker]"
pip install "agt-sandbox[hyperlight]"
pip install "agt-sandbox[azure,policy]"

The Azure data-plane SDK ships as an early-access wheel — pin the URL:

pip install https://github.com/microsoft/azure-container-apps/releases/download/python-sdk-v0.1.0b1-early-access/azure_containerapps_sandbox-0.1.0b1-py3-none-any.whl

Quick start (all three providers)

from agent_sandbox import (
    DockerSandboxProvider,
    HyperLightSandboxProvider,
    ACASandboxProvider,
)

# Pick one:
provider = DockerSandboxProvider()
# provider = HyperLightSandboxProvider(backend="wasm")
# provider = ACASandboxProvider(
#     resource_group="my-rg", sandbox_group="agents",
#     region="eastus2", disk="python-3.13",
#     ensure_group_location="eastus2",
# )

handle = provider.create_session("agent-1")
out = provider.execute_code("agent-1", handle.session_id, "print('hello')")
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)

1. DockerSandboxProvider — local hardened containers

Each agent session runs in its own container with capabilities dropped, no privilege escalation, a read-only root filesystem, a non-root user, and no network by default.

import asyncio
from agent_sandbox import (
    DockerSandboxProvider,
    IsolationRuntime,
    SandboxConfig,
)

async def run_agent_task():
    provider = DockerSandboxProvider(
        image="python:3.12-slim",
        runtime=IsolationRuntime.AUTO,   # auto-upgrade to gVisor / Kata
    )
    config = SandboxConfig(
        timeout_seconds=30,
        memory_mb=256,
        cpu_limit=0.5,
        network_enabled=False,
        read_only_fs=True,
    )

    session = await provider.create_session_async("research-agent", config=config)
    try:
        execution = await provider.execute_code_async(
            "research-agent", session.session_id,
            "import json, math; print(json.dumps([math.sqrt(x) for x in range(5)]))",
        )
        print(execution.result.stdout)

        checkpoint = provider.save_state(
            "research-agent", session.session_id, "after-step-1",
        )
        print(f"Checkpoint saved: {checkpoint.image_tag}")
    finally:
        await provider.destroy_session_async("research-agent", session.session_id)

asyncio.run(run_agent_task())

What the Docker sandbox enforces

Control Default
Linux capabilities All dropped (--cap-drop=ALL)
Privilege escalation Blocked (--security-opt=no-new-privileges)
Root filesystem Read-only
Container user nobody (UID 65534)
PID limit 256
Network Disabled unless explicitly allowed
Runtime runc (auto-upgrades to gVisor or Kata when available)
State save_state / restore_state via image commit

2. HyperLightSandboxProvider — micro-VM isolation

Backed by the upstream hyperlight-sandbox runtime. Each session is a fresh micro-VM on KVM (Linux), mshv (Azure HCL), or WHP (Windows) — typical cold start is well under a millisecond. Tools are registered as host functions and invoked synchronously from the guest, gated by the session's policy.tool_allowlist.

from agent_sandbox import HyperLightSandboxProvider

def fetch_arxiv(query: str) -> str:
    return f"<results for {query}>"

provider = HyperLightSandboxProvider(
    backend="wasm",                 # or "hyperlightjs" / "nanvix"
    module="python_guest",          # only meaningful for backend="wasm"
    tools={"fetch_arxiv": fetch_arxiv},
)

if not provider.is_available():
    raise SystemExit(f"Hyperlight unavailable: {provider.unavailable_reason}")

handle = provider.create_session("agent-1")
out = provider.execute_code(
    "agent-1", handle.session_id,
    "print(fetch_arxiv('cs.CL'))",
)
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)

Notes:

  • Each session owns one OS thread that is the sole code path touching its Sandbox — required by the upstream runtime.
  • provider.is_available() probes for a hypervisor and returns unavailable_reason if none is present (e.g. on macOS hosts without WHP / KVM passthrough).
  • Only tools listed in a session's policy.tool_allowlist are exposed to that session's guest; the rest stay host-side.

3. ACASandboxProvider — Azure Container Apps

Runs each session inside a managed Azure Container Apps sandbox via the early-access azure-containerapps-sandbox Python SDK (complete reference). Same API as the other providers; the rest of your code is unchanged.

pip install "agt-sandbox[azure,policy]"
pip install https://github.com/microsoft/azure-container-apps/releases/download/python-sdk-v0.1.0b1-early-access/azure_containerapps_sandbox-0.1.0b1-py3-none-any.whl

az login   # or use managed identity in hosted compute
from agent_sandbox import ACASandboxProvider

provider = ACASandboxProvider(
    resource_group="my-rg",          # must already exist
    sandbox_group="agents",          # auto-created if ensure_group_location is set
    region="eastus2",                # selects the data-plane endpoint
    subscription_id=None,            # falls back to AZURE_SUBSCRIPTION_ID env var
    disk="python-3.13",              # public disk image with python3 preinstalled
    ensure_group_location="eastus2", # create the sandbox group on first use
)

if not provider.is_available():
    raise SystemExit(f"ACA unavailable: {provider.unavailable_reason}")

handle = provider.create_session("agent-1")
out = provider.execute_code(
    "agent-1", handle.session_id, "print('hello azure')"
)
print(out.result.stdout)
provider.destroy_session("agent-1", handle.session_id)
provider.close()

The provider holds one SandboxGroupClient per (resource_group, sandbox_group) pair and caches the per-sandbox SandboxClient returned by begin_create_sandbox().result(). When a PolicyDocument is supplied, network_allowlist is translated into a fail-closed egress policy (defaultAction: Deny + per-host Allow rules) and applied via SandboxClient.set_egress_policy. Set defaults.network_default: allow in the policy if you explicitly want the SDK's default-allow behaviour.

A complete worked example (8 verified branches against live Azure — allow / policy-deny / egress-block / sanity / tool-allowed / tool-denied / remote-execution proof / egress audit) lives at examples/quickstart/aca_sandbox_test.py and reads its policy from examples/quickstart/policies/aca_research_agent.yaml.


Policy-driven configuration

All three providers consume the same agent_os.policies.PolicyDocument. Sandbox resource caps, network allowlists, and tool allowlists are native fields on the schema as of AGT 3.3, so policies live in YAML:

name: research-agent
version: "2"

defaults:
  action: allow
  max_cpu: 1.0
  max_memory_mb: 2048
  timeout_seconds: 90
  network_default: deny

network_allowlist:
  - api.openai.com
  - "*.github.com"

tool_allowlist:
  - fetch_arxiv

rules:
  - name: deny-shell-out
    condition: { field: code, operator: contains, value: subprocess }
    action: deny
    priority: 100
    message: "shell-out blocked by research-agent policy"
from agent_os.policies import PolicyDocument

policy = PolicyDocument.from_yaml("policies/aca_research_agent.yaml")
handle = await provider.create_session_async("agent-1", policy=policy)

Hardened sandbox image (minimal-PATH)

docker/Dockerfile.sandbox is an opt-in hardened variant of the default python:3.11-slim base. It pins PATH to a single explicit directory (/usr/local/sandbox-bin) containing only the binaries sandboxed code is allowed to invoke, and strips the execute bit off well-known network and infra CLIs (curl, wget, ssh, git, az, aws, gcloud, kubectl, terraform, helm, ansible, apt, dpkg, …) as a second-layer guarantee in case a caller goes through an absolute path.

This closes the gap that issue #2662 identifies: without a pinned PATH, a tool can invoke os.system('az account list') inside the sandbox and the attempt is not blocked or logged by AGT even though the network-egress policy would later refuse the call. The hardened image makes the attempt itself fail with "command not found".

# Build with the default allow-list (python3, cat, echo, ls, sleep).
docker build \
  -f agent-sandbox/docker/Dockerfile.sandbox \
  -t agt-sandbox/python-minimal-path:3.11 \
  agent-sandbox/docker

# Build with a custom allow-list — add only what the sandboxed workload
# actually needs. The full allow-list IS the new PATH; any binary not listed
# here is unreachable.
docker build \
  --build-arg ALLOWED_BIN_NAMES="python3 cat echo ls sleep grep sort uniq" \
  -f agent-sandbox/docker/Dockerfile.sandbox \
  -t agt-sandbox/python-minimal-path:3.11 \
  agent-sandbox/docker

Wire the image into DockerSandboxProvider via the existing image argument:

provider = DockerSandboxProvider(image="agt-sandbox/python-minimal-path:3.11")

For security-sensitive deployments, require the hardened image so the provider fails instead of silently falling back to python:3.11-slim when the local image is unavailable:

provider = DockerSandboxProvider(require_hardened_image=True)

Build the image before creating the provider. require_hardened_image=True cannot be combined with a custom image=.

To extend the allow-list permanently (rather than at docker build time), edit the ARG ALLOWED_BIN_NAMES= line in Dockerfile.sandbox and rebuild. The tests/test_docker_sandbox.py::TestMinimalPathSandboxImage smoke tests assert that the default allow-list cannot accidentally regress to include network or infra CLIs.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agt_sandbox-4.1.0.tar.gz (92.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agt_sandbox-4.1.0-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file agt_sandbox-4.1.0.tar.gz.

File metadata

  • Download URL: agt_sandbox-4.1.0.tar.gz
  • Upload date:
  • Size: 92.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for agt_sandbox-4.1.0.tar.gz
Algorithm Hash digest
SHA256 563982cb9b00b55f455dac8415a35d899f0d59839c5f66bb399674a78d5ff5d3
MD5 772c9835a7ee1cd661f4f005d3c7db4b
BLAKE2b-256 5f848bc57af937d40ca84ae2bf76d36c4bb0b1159e754965b76cb1c7824f2f09

See more details on using hashes here.

File details

Details for the file agt_sandbox-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: agt_sandbox-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for agt_sandbox-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35e5b6e16e0a9edb9bda29a030945d39694a881432360f918f4db211043c8475
MD5 4a09cb1ebc9d7fbccbdb3e25962476ff
BLAKE2b-256 1d95f21a1a6c4ca7b71f335950370e10d1bda9cc30d427eb3d52eaba0afbfd27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page