A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents
Project description
LLM-in-Sandbox
A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents.
Features:
- 🌍 General-purpose: works beyond coding—scientific reasoning, long-cotext understanding, video production, travel planning, and more
- 🐳 Isolated execution environment via Docker containers
- 🔌 Compatible with OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.)
- 📁 Flexible I/O: mount any input files, export any output files
Installation
Requirements: Python 3.10+, Docker
git clone https://github.com/llm-in-sandbox/llm-in-sandbox.git
cd llm-in-sandbox
pip install -e .
Docker Image
The default Docker image (cdx123/llm-in-sandbox:v0.1) will be automatically pulled when you first run the agent. The first run may take a minute to download the image (~400MB), but subsequent runs will start instantly.
Advanced: Build your own image
Modify Dockerfile and build your own image:
llm-in-sandbox build
# Then use: --docker_image llm-in-sandbox:v0.1
Quick Start
LLM-in-Sandbox works with various LLM providers including OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.).
Option 1: Cloud / API Services
llm-in-sandbox run \
--query "write a hello world in python" \
--llm_name "openai/gpt-5" \
--llm_base_url "http://your-api-server/v1" \
--api_key "your-api-key"
Option 2: Self-Hosted Models
Using local vLLM server for Qwen3-Coder-30B-A3B-Instruct
1. Start vLLM server:
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--served-model-name qwen3_coder \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--tensor-parallel-size 8
2. Run agent (in a new terminal once server is ready):
llm-in-sandbox run \
--query "write a hello world in python" \
--llm_name qwen3_coder \
--llm_base_url "http://localhost:8000/v1" \
--temperature 0.7
Using local SGLang server for DeepSeek-V3.2-Thinking
1. Start sgLang server:
python3 -m sglang.launch_server \
--model-path "deepseek-ai/DeepSeek-V3.2" \
--served-model-name "DeepSeek-V3.2" \
--trust-remote-code \
--tp-size 8 \
--tool-call-parser deepseekv32 \
--reasoning-parser deepseek-v3 \
--host 0.0.0.0 \
--port 5678
2. Run agent (in a new terminal once server is ready):
llm-in-sandbox run \
--query "write a hello world in python" \
--llm_name DeepSeek-V3.2 \
--llm_base_url "http://0.0.0.0:5678/v1" \
--extra_body '{"chat_template_kwargs": {"thinking": True}}'
Parameters (Common)
| Parameter | Description | Default |
|---|---|---|
--query |
Task for the agent | required |
--llm_name |
Model name | required |
--llm_base_url |
API endpoint URL | from LLM_BASE_URL env var |
--api_key |
API key (not needed for local server) | from OPENAI_API_KEY env var |
--input_dir |
Input files folder to mount (Optional) | None |
--output_dir |
Output folder for results | ./output |
--docker_image |
Docker image to use | cdx123/llm-in-sandbox:v0.1 |
--prompt_config |
Path to prompt template | ./config/general.yaml |
--temperature |
Sampling temperature | 1.0 |
--max_steps |
Max conversation turns | 100 |
--extra_body |
Extra JSON body for LLM API calls | None |
Run llm-in-sandbox run --help for all available parameters.
Output
Each run creates a timestamped folder:
output/2026-01-16_14-30-00/
├── files/
│ ├── answer.txt # Final answer
│ └── hello_world.py # Output file
└── trajectory.json # Execution history
More Examples
We provide examples across diverse non-coding domains: travel planning, video production, music composition, poster design, and more.
👉 See examples/README.md for the full list.
Contact Us
Daixuan Cheng: daixuancheng6@gmail.com
Shaohan Huang: shaohanh@microsoft.com
Acknowledgment
We learned the design and reused code from R2E-Gym. Thanks for the great work!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_in_sandbox-0.1.0.tar.gz.
File metadata
- Download URL: llm_in_sandbox-0.1.0.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d093b7b85e17a2cdf6f17a41d5bd25f302ef5d9608f92414231b7cf53e43b65
|
|
| MD5 |
1535cf086895611841750b4b978d8436
|
|
| BLAKE2b-256 |
18a41d510e782faa93dcc531632e6aa25944639e4df2789f5345b3ac157dadcd
|
File details
Details for the file llm_in_sandbox-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_in_sandbox-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1919486e15149ee30b7ed992b529ad6e31bff5d86128495ebcef6448014cdca8
|
|
| MD5 |
4c57a280f43415d4801f1984384ef731
|
|
| BLAKE2b-256 |
8e3a4b849a8864fad1067e175e2b10bad39fae5fa441f88f6359dd8499d204a9
|