Linux-native sandboxed code execution for agent workflows using bubblewrap and cgroups

These details have not been verified by PyPI

Project links

Repository

Project description

agentsandbox

A small Linux sandbox for native sandbox support for agent workflows in the same spirit as OpenAI Agents SDK. On Linux it uses bubblewrap for isolation and cgroups v2 for limits, similar to Codex's open-source Linux sandbox approach. No containers, no VMs, just Linux primitives.

Status: Alpha. Works on Linux with cgroup v2. Tested on Ubuntu 22.04 and 24.04.

Goal of this project

The goal is simple:

let an LLM or agent run code without giving it your whole machine
keep its work inside /workspace
block obvious bad behavior like reading host files, using the network, or spawning too many processes
stay small enough that you can use it directly as a Python package

How this works

When you create a sandbox, Sandy:

makes a fresh temporary workspace
starts bubblewrap with isolated namespaces
mounts only a small filesystem view inside the sandbox
makes /workspace and /tmp writable
keeps the network off by default
applies memory and PID limits with cgroups when the environment allows it
deletes the temporary workspace when the sandbox closes

This is broadly similar to the sandboxing approach OpenAI has described for Codex. OpenAI’s public Codex materials say tasks run in isolated cloud sandboxes/containers, and the open-source Codex Linux sandbox docs say bubblewrap is the default filesystem sandbox on Linux. Sandy is not the same implementation, but it follows the same general idea: isolate execution, keep the filesystem tight, and only expose the paths the tool actually needs. Sources: OpenAI, Introducing Codex and openai/codex Linux sandbox README.

One important detail: if you keep using the same sandbox instance, the same /workspace stays there between commands. If you create a new sandbox instance, you get a fresh workspace.

Simplest example

This is the smallest useful example:

import asyncio
from agentsandbox import NativeSandbox

async def main():
    async with NativeSandbox() as sandbox:
        await sandbox.execute("printf 'print(1 / 0)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stderr.strip())

        await sandbox.execute("printf 'print(1 + 1)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stdout.strip())

asyncio.run(main())

The same sandbox instance keeps /workspace/main.py between those commands.

More complete example

import asyncio
from agentsandbox import NativeSandbox, SandboxConfig

async def main():
    config = SandboxConfig(
        memory_limit_mb=512,
        max_pids=256,
        timeout_seconds=30,
    )

    async with NativeSandbox(config) as sandbox:
        # run a command
        result = await sandbox.execute("echo hello from the sandbox")
        print(result.stdout)   # hello from the sandbox
        print(result.exit_code)  # 0

        # upload files into the sandbox workspace
        await sandbox.upload_files([
            ("script.sh", b"#!/bin/sh\nwhoami && ls -la"),
        ])

        # run the uploaded script
        result = await sandbox.execute("sh /workspace/script.sh")
        print(result.stdout)

        # download files from the workspace
        downloads = await sandbox.download_files(["output.txt"])
        print(downloads[0].content.decode())

asyncio.run(main())

What you get

Process isolation via bubblewrap's PID, user, mount, network, and IPC namespaces
Resource limits via cgroups v2: memory cap, PID ceiling, CPU weight
Network isolation by default (opt-in with network_access=True)
Filesystem sandbox — only the workspace directory is writable; /usr, /bin, /lib, /lib64 are read-only bind mounts
Timeout enforcement — commands that run too long are killed
Output size limits — stdout/stderr truncated at 256KB to prevent log exhaustion
Graceful degradation — if cgroups aren't available, agentsandbox warns and runs without resource limits

Requirements

Linux (x86-64 or ARM64)
cgroup v2 (standard on modern Linux distros)
bubblewrap (apt install bubblewrap on Debian/Ubuntu)
Python 3.10+

If you're developing on macOS, run agentsandbox inside Docker or a Linux VM. bubblewrap and cgroup v2 are Linux-only. Some Docker environments expose cgroup v2 but do not delegate writable controllers; in that case agentsandbox warns and runs without memory/PID/CPU limits.

Installation

pip install agentsandbox

For Deep Agents / LangChain usage:

pip install "agentsandbox[langchain]"

For development:

git clone https://github.com/john221wick/sandy.git
cd sandy
pip install -e ".[dev]"

Configuration

SandboxConfig is a frozen dataclass — pass it to NativeSandbox at construction:

config = SandboxConfig(
    memory_limit_mb=512,      # max RAM (default: 512MB)
    max_pids=256,             # max processes (default: 256)
    cpu_shares=100,           # CPU weight (default: 100)
    timeout_seconds=60.0,     # hard timeout (default: 60s)
    max_output_bytes=262144,  # stdout/stderr cap (default: 256KB)
    network_access=False,     # allow outbound network (default: False)
    gpu=False,                # expose GPU (default: False, reserved for v2)
    extra_bind_mounts=[],     # list of (host_path, sandbox_path) tuples
    extra_env={},             # extra environment variables
)

Deep Agents adapter

Sandy exposes a Deep Agents backend directly from the package root:

from deepagents import create_deep_agent
from agentsandbox import Sandbox

backend = Sandbox()
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",
    backend=backend,
)

result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "Write /workspace/hello.py, then run it.",
            }
        ]
    }
)

backend.close()

Notes:

This targets Deep Agents specifically, not bare ChatModel.invoke(...).
Use absolute paths under /workspace.
/tmp/... is supported for backend temp-file flows used by Deep Agents.
The adapter assumes it is running on Linux or inside Docker where bubblewrap works.

Tests included

The test suite is split into three parts:

unit
- workspace creation, read, write, list, snapshot, restore, and cleanup
- path traversal checks like ../../etc/passwd
- symlink escape checks
- executor command validation, mount flags, env flags, and network flags
- timeout handling
- cgroup slice creation, config writing, and PID attachment
- Linux and bwrap prerequisite detection
- NativeSandbox lifecycle and timeout forwarding
- Deep Agents adapter behavior for /workspace, /tmp, invalid paths, and error mapping
integration
- real command execution through bubblewrap
- current working directory is /workspace
- network is blocked by default
- system paths are read-only
- /workspace is writable
- file upload, execute, and download flows work
- timeout handling on real commands
- Deep Agents adapter can run commands and move files through /workspace and /tmp
adversarial
- fork bomb containment
- memory bomb containment when cgroup memory limits are available
- blocked network access with curl and DNS lookups
- host file access checks like /etc/shadow
- path traversal attempts from inside the workspace
- symlink escape attempts
- blocked privilege escalation with sudo and su

Run them like this:

On macOS or Windows, you can run the unit tests (no bwrap required):

make setup
make unit

To run the full test suite including integration and adversarial tests on macOS, use Docker:

docker build -t agentsandbox-test .
docker run --rm --privileged --cgroupns=private agentsandbox-test

To run just the Deep Agents adapter tests in Docker:

docker build -t agentsandbox-test .
docker run --rm --privileged --cgroupns=private agentsandbox-test \
  pytest -v --tb=short tests/unit/test_langchain_adapter.py tests/integration/test_langchain_adapter.py

The container must run with --privileged --cgroupns=private so bubblewrap and cgroups work inside the container.

Or use the Makefile targets directly:

make lint          # ruff
make typecheck     # mypy
make test          # pytest (all tests)
make unit          # pytest -m "not integration and not adversarial"
make integration   # pytest -m integration
make adversarial   # pytest -m adversarial

Security properties

The sandbox limits what a compromised or malicious command can do:

Fork bomb (:(){ :|:& };:): PID limit via cgroups pids.max
Memory exhaustion: memory limit via cgroups memory.max
Network exfiltration: --unshare-net by default
Read host files like /etc/shadow: read-only filesystem, only /workspace writable
Path traversal like ../../etc/passwd: workspace-relative path enforcement
Privilege escalation with sudo or su: dropped capabilities, user namespace isolation, and synthetic passwd/group files

Caveats: This is not a hard security boundary like a VM or a rootless container. It's designed to catch accidental mistakes and naive adversarial prompts. A sufficiently motivated attacker with kernel access or sufficient privileges can escape it. Use appropriately.

Project layout

agentsandbox/
  detect.py    # prerequisite checks (bwrap, cgroup v2, user namespaces)
  workspace.py # temp directory with snapshot/restore
  cgroup.py    # cgroup v2 slice creation and cleanup
  executor.py  # bubblewrap subprocess management
  config.py    # SandboxConfig dataclass
  sandbox.py   # NativeSandbox (async context manager)
  __init__.py  # public API exports

tests/
  unit/        # mocked tests, run on any OS
  integration/ # real bwrap execution tests
  adversarial/ # escape attempt tests

License

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.0.2

Apr 22, 2026

This version

0.0.1

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepagents_sandbox-0.0.1.tar.gz (23.2 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepagents_sandbox-0.0.1-py3-none-any.whl (16.9 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file deepagents_sandbox-0.0.1.tar.gz.

File metadata

Download URL: deepagents_sandbox-0.0.1.tar.gz
Upload date: Apr 22, 2026
Size: 23.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for deepagents_sandbox-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e906390c47fb45cf0b57b5c166fc415ae40cba3fa88d0d8ada9dd46b3c8ad948`
MD5	`261a4e0dc8f85b924369a718478e7a25`
BLAKE2b-256	`ad978ae1f8d8b9e321a0ff0b927b6201c67daba12f5ff099e66ae8a07e013f7c`

See more details on using hashes here.

File details

Details for the file deepagents_sandbox-0.0.1-py3-none-any.whl.

File metadata

Download URL: deepagents_sandbox-0.0.1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for deepagents_sandbox-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`30bdddb69546e935fe1561c920c5097813da06a9c52d1a5a48ced52eab2532c3`
MD5	`aaae840119476c58ecfda2f0bf8d10f4`
BLAKE2b-256	`5b5ad6607a832c7460f44c0970f8adff643886b0b3162defb6eae7ef735dfabb`

See more details on using hashes here.

deepagents-sandbox 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agentsandbox

Goal of this project

How this works

Simplest example

More complete example

What you get

Requirements

Installation

Configuration

Deep Agents adapter

Tests included

Security properties

Project layout

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes