Skip to main content

Linux-native sandboxed code execution for agent workflows using bubblewrap and cgroups

Project description

agentsandbox

A small Linux sandbox for native sandbox support for agent workflows in the same spirit as OpenAI Agents SDK. On Linux it uses bubblewrap for isolation and cgroups v2 for limits, similar to Codex's open-source Linux sandbox approach. No containers, no VMs, just Linux primitives.

Status: Alpha. Works on Linux with cgroup v2. Tested on Ubuntu 22.04 and 24.04.

Goal of this project

The goal is simple:

  • let an LLM or agent run code without giving it your whole machine
  • keep its work inside /workspace
  • block obvious bad behavior like reading host files, using the network, or spawning too many processes
  • stay small enough that you can use it directly as a Python package

How this works

When you create a sandbox, Sandy:

  1. makes a fresh temporary workspace
  2. starts bubblewrap with isolated namespaces
  3. mounts only a small filesystem view inside the sandbox
  4. makes /workspace and /tmp writable
  5. keeps the network off by default
  6. applies memory and PID limits with cgroups when the environment allows it
  7. deletes the temporary workspace when the sandbox closes

This is broadly similar to the sandboxing approach OpenAI has described for Codex. OpenAI’s public Codex materials say tasks run in isolated cloud sandboxes/containers, and the open-source Codex Linux sandbox docs say bubblewrap is the default filesystem sandbox on Linux. Sandy is not the same implementation, but it follows the same general idea: isolate execution, keep the filesystem tight, and only expose the paths the tool actually needs. Sources: OpenAI, Introducing Codex and openai/codex Linux sandbox README.

One important detail: if you keep using the same sandbox instance, the same /workspace stays there between commands. If you create a new sandbox instance, you get a fresh workspace.

Simplest example

This is the smallest useful example:

import asyncio
from agentsandbox import NativeSandbox

async def main():
    async with NativeSandbox() as sandbox:
        await sandbox.execute("printf 'print(1 / 0)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stderr.strip())

        await sandbox.execute("printf 'print(1 + 1)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stdout.strip())

asyncio.run(main())

The same sandbox instance keeps /workspace/main.py between those commands.

More complete example

import asyncio
from agentsandbox import NativeSandbox, SandboxConfig

async def main():
    config = SandboxConfig(
        memory_limit_mb=512,
        max_pids=256,
        timeout_seconds=30,
    )

    async with NativeSandbox(config) as sandbox:
        # run a command
        result = await sandbox.execute("echo hello from the sandbox")
        print(result.stdout)   # hello from the sandbox
        print(result.exit_code)  # 0

        # upload files into the sandbox workspace
        await sandbox.upload_files([
            ("script.sh", b"#!/bin/sh\nwhoami && ls -la"),
        ])

        # run the uploaded script
        result = await sandbox.execute("sh /workspace/script.sh")
        print(result.stdout)

        # download files from the workspace
        downloads = await sandbox.download_files(["output.txt"])
        print(downloads[0].content.decode())

asyncio.run(main())

What you get

  • Process isolation via bubblewrap's PID, user, mount, network, and IPC namespaces
  • Resource limits via cgroups v2: memory cap, PID ceiling, CPU weight
  • Network isolation by default (opt-in with network_access=True)
  • Filesystem sandbox — only the workspace directory is writable; /usr, /bin, /lib, /lib64 are read-only bind mounts
  • Timeout enforcement — commands that run too long are killed
  • Output size limits — stdout/stderr truncated at 256KB to prevent log exhaustion
  • Graceful degradation — if cgroups aren't available, agentsandbox warns and runs without resource limits

Requirements

  • Linux (x86-64 or ARM64)
  • cgroup v2 (standard on modern Linux distros)
  • bubblewrap (apt install bubblewrap on Debian/Ubuntu)
  • Python 3.10+

If you're developing on macOS, run agentsandbox inside Docker or a Linux VM. bubblewrap and cgroup v2 are Linux-only. Some Docker environments expose cgroup v2 but do not delegate writable controllers; in that case agentsandbox warns and runs without memory/PID/CPU limits.

Installation

pip install agentsandbox

For Deep Agents / LangChain usage:

pip install "agentsandbox[langchain]"

For development:

git clone https://github.com/john221wick/sandy.git
cd sandy
pip install -e ".[dev]"

Configuration

SandboxConfig is a frozen dataclass — pass it to NativeSandbox at construction:

config = SandboxConfig(
    memory_limit_mb=512,      # max RAM (default: 512MB)
    max_pids=256,             # max processes (default: 256)
    cpu_shares=100,           # CPU weight (default: 100)
    timeout_seconds=60.0,     # hard timeout (default: 60s)
    max_output_bytes=262144,  # stdout/stderr cap (default: 256KB)
    network_access=False,     # allow outbound network (default: False)
    gpu=False,                # expose GPU (default: False, reserved for v2)
    extra_bind_mounts=[],     # list of (host_path, sandbox_path) tuples
    extra_env={},             # extra environment variables
)

Deep Agents adapter

Sandy exposes a Deep Agents backend directly from the package root:

from deepagents import create_deep_agent
from agentsandbox import Sandbox

backend = Sandbox()
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",
    backend=backend,
)

result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "Write /workspace/hello.py, then run it.",
            }
        ]
    }
)

backend.close()

Notes:

  • This targets Deep Agents specifically, not bare ChatModel.invoke(...).
  • Use absolute paths under /workspace.
  • /tmp/... is supported for backend temp-file flows used by Deep Agents.
  • The adapter assumes it is running on Linux or inside Docker where bubblewrap works.

Tests included

The test suite is split into three parts:

  • unit
    • workspace creation, read, write, list, snapshot, restore, and cleanup
    • path traversal checks like ../../etc/passwd
    • symlink escape checks
    • executor command validation, mount flags, env flags, and network flags
    • timeout handling
    • cgroup slice creation, config writing, and PID attachment
    • Linux and bwrap prerequisite detection
    • NativeSandbox lifecycle and timeout forwarding
    • Deep Agents adapter behavior for /workspace, /tmp, invalid paths, and error mapping
  • integration
    • real command execution through bubblewrap
    • current working directory is /workspace
    • network is blocked by default
    • system paths are read-only
    • /workspace is writable
    • file upload, execute, and download flows work
    • timeout handling on real commands
    • Deep Agents adapter can run commands and move files through /workspace and /tmp
  • adversarial
    • fork bomb containment
    • memory bomb containment when cgroup memory limits are available
    • blocked network access with curl and DNS lookups
    • host file access checks like /etc/shadow
    • path traversal attempts from inside the workspace
    • symlink escape attempts
    • blocked privilege escalation with sudo and su

Run them like this:

On macOS or Windows, you can run the unit tests (no bwrap required):

make setup
make unit

To run the full test suite including integration and adversarial tests on macOS, use Docker:

docker build -t agentsandbox-test .
docker run --rm --privileged --cgroupns=private agentsandbox-test

To run just the Deep Agents adapter tests in Docker:

docker build -t agentsandbox-test .
docker run --rm --privileged --cgroupns=private agentsandbox-test \
  pytest -v --tb=short tests/unit/test_langchain_adapter.py tests/integration/test_langchain_adapter.py

The container must run with --privileged --cgroupns=private so bubblewrap and cgroups work inside the container.

Or use the Makefile targets directly:

make lint          # ruff
make typecheck     # mypy
make test          # pytest (all tests)
make unit          # pytest -m "not integration and not adversarial"
make integration   # pytest -m integration
make adversarial   # pytest -m adversarial

Security properties

The sandbox limits what a compromised or malicious command can do:

  • Fork bomb (:(){ :|:& };:): PID limit via cgroups pids.max
  • Memory exhaustion: memory limit via cgroups memory.max
  • Network exfiltration: --unshare-net by default
  • Read host files like /etc/shadow: read-only filesystem, only /workspace writable
  • Path traversal like ../../etc/passwd: workspace-relative path enforcement
  • Privilege escalation with sudo or su: dropped capabilities, user namespace isolation, and synthetic passwd/group files

Caveats: This is not a hard security boundary like a VM or a rootless container. It's designed to catch accidental mistakes and naive adversarial prompts. A sufficiently motivated attacker with kernel access or sufficient privileges can escape it. Use appropriately.

Project layout

agentsandbox/
  detect.py    # prerequisite checks (bwrap, cgroup v2, user namespaces)
  workspace.py # temp directory with snapshot/restore
  cgroup.py    # cgroup v2 slice creation and cleanup
  executor.py  # bubblewrap subprocess management
  config.py    # SandboxConfig dataclass
  sandbox.py   # NativeSandbox (async context manager)
  __init__.py  # public API exports

tests/
  unit/        # mocked tests, run on any OS
  integration/ # real bwrap execution tests
  adversarial/ # escape attempt tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepagents_sandbox-0.0.1.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepagents_sandbox-0.0.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file deepagents_sandbox-0.0.1.tar.gz.

File metadata

  • Download URL: deepagents_sandbox-0.0.1.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for deepagents_sandbox-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e906390c47fb45cf0b57b5c166fc415ae40cba3fa88d0d8ada9dd46b3c8ad948
MD5 261a4e0dc8f85b924369a718478e7a25
BLAKE2b-256 ad978ae1f8d8b9e321a0ff0b927b6201c67daba12f5ff099e66ae8a07e013f7c

See more details on using hashes here.

File details

Details for the file deepagents_sandbox-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for deepagents_sandbox-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 30bdddb69546e935fe1561c920c5097813da06a9c52d1a5a48ced52eab2532c3
MD5 aaae840119476c58ecfda2f0bf8d10f4
BLAKE2b-256 5b5ad6607a832c7460f44c0970f8adff643886b0b3162defb6eae7ef735dfabb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page