Skip to main content

A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents

Project description

LLM-in-Sandbox

A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents.

Features:

  • 🌍 General-purpose: works beyond coding—scientific reasoning, long-cotext understanding, video production, travel planning, and more
  • 🐳 Isolated execution environment via Docker containers
  • 🔌 Compatible with OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.)
  • 📁 Flexible I/O: mount any input files, export any output files

Installation

Requirements: Python 3.10+, Docker

git clone https://github.com/llm-in-sandbox/llm-in-sandbox.git
cd llm-in-sandbox
pip install -e .

Docker Image

The default Docker image (cdx123/llm-in-sandbox:v0.1) will be automatically pulled when you first run the agent. The first run may take a minute to download the image (~400MB), but subsequent runs will start instantly.

Advanced: Build your own image

Modify Dockerfile and build your own image:

llm-in-sandbox build
# Then use: --docker_image llm-in-sandbox:v0.1

Quick Start

LLM-in-Sandbox works with various LLM providers including OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.).

Option 1: Cloud / API Services

llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name "openai/gpt-5" \
    --llm_base_url "http://your-api-server/v1" \
    --api_key "your-api-key"

Option 2: Self-Hosted Models

Using local vLLM server for Qwen3-Coder-30B-A3B-Instruct

1. Start vLLM server:

vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
    --served-model-name qwen3_coder \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --tensor-parallel-size 8

2. Run agent (in a new terminal once server is ready):

llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name qwen3_coder \
    --llm_base_url "http://localhost:8000/v1"  \
    --temperature 0.7
Using local SGLang server for DeepSeek-V3.2-Thinking

1. Start sgLang server:

python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-V3.2" \
    --served-model-name "DeepSeek-V3.2" \
    --trust-remote-code \
    --tp-size 8 \
    --tool-call-parser deepseekv32 \
    --reasoning-parser deepseek-v3 \
    --host 0.0.0.0 \
    --port 5678

2. Run agent (in a new terminal once server is ready):

llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name DeepSeek-V3.2 \
    --llm_base_url "http://0.0.0.0:5678/v1" \
    --extra_body '{"chat_template_kwargs": {"thinking": True}}'

Parameters (Common)

Parameter Description Default
--query Task for the agent required
--llm_name Model name required
--llm_base_url API endpoint URL from LLM_BASE_URL env var
--api_key API key (not needed for local server) from OPENAI_API_KEY env var
--input_dir Input files folder to mount (Optional) None
--output_dir Output folder for results ./output
--docker_image Docker image to use cdx123/llm-in-sandbox:v0.1
--prompt_config Path to prompt template ./config/general.yaml
--temperature Sampling temperature 1.0
--max_steps Max conversation turns 100
--extra_body Extra JSON body for LLM API calls None

Run llm-in-sandbox run --help for all available parameters.

Output

Each run creates a timestamped folder:

output/2026-01-16_14-30-00/
├── files/
│   ├── answer.txt      # Final answer
│   └── hello_world.py  # Output file
└── trajectory.json     # Execution history

More Examples

We provide examples across diverse non-coding domains: travel planning, video production, music composition, poster design, and more.

👉 See examples/README.md for the full list.

Contact Us

Daixuan Cheng: daixuancheng6@gmail.com
Shaohan Huang: shaohanh@microsoft.com

Acknowledgment

We learned the design and reused code from R2E-Gym. Thanks for the great work!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_in_sandbox-0.1.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_in_sandbox-0.1.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_in_sandbox-0.1.0.tar.gz.

File metadata

  • Download URL: llm_in_sandbox-0.1.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for llm_in_sandbox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9d093b7b85e17a2cdf6f17a41d5bd25f302ef5d9608f92414231b7cf53e43b65
MD5 1535cf086895611841750b4b978d8436
BLAKE2b-256 18a41d510e782faa93dcc531632e6aa25944639e4df2789f5345b3ac157dadcd

See more details on using hashes here.

File details

Details for the file llm_in_sandbox-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_in_sandbox-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for llm_in_sandbox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1919486e15149ee30b7ed992b529ad6e31bff5d86128495ebcef6448014cdca8
MD5 4c57a280f43415d4801f1984384ef731
BLAKE2b-256 8e3a4b849a8864fad1067e175e2b10bad39fae5fa441f88f6359dd8499d204a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page