Skip to main content

Code Agent execution sandbox - reproducible Docker environments for isolated code task execution and trajectory replay

Project description

AgentSandbox

Code Agent 执行沙箱 - 可复现的 Docker 隔离执行环境 Reproducible Docker sandbox for Code Agent task execution and trajectory replay

PyPI Python 3.10+ License: MIT MCP

快速开始 · CLI 命令 · MCP Server · Knowlyr 生态


GitHub Topics: sandbox, code-agent, docker, execution-environment, trajectory-replay, mcp

为 Code Agent 提供标准化的 Docker 沙箱执行环境,支持代码任务的隔离执行、状态快照与轨迹重放。

核心能力 / Core Capabilities

TaskConfig (repo + commit) → Docker 沙箱 → Agent 工具调用 → 轨迹记录 → 可复现重放

标准工具接口 / Standard Tool Interface

工具 功能 说明
file_read 读取文件 支持行号范围
file_write 写入文件 自动创建目录
shell 执行命令 超时控制
search 搜索代码 正则匹配
git Git 操作 diff, log, status

解决的问题 / Problems Solved

痛点 传统方案 AgentSandbox
隔离性 在宿主机执行,有安全风险 Docker 容器隔离
可复现 环境差异导致结果不一致 固定镜像 + commit
可追踪 操作过程难以记录 完整轨迹记录与重放
资源控制 无限制的资源使用 CPU/内存/超时限制

安装 / Installation

pip install knowlyr-sandbox

可选依赖:

pip install knowlyr-sandbox[mcp]   # MCP 服务器
pip install knowlyr-sandbox[dev]   # 开发工具
pip install knowlyr-sandbox[all]   # 全部功能

快速开始 / Quick Start

CLI 使用 / CLI Usage

# 创建沙箱
knowlyr-sandbox create --repo https://github.com/user/repo --commit abc123

# 在沙箱中执行工具
knowlyr-sandbox exec <sandbox_id> --tool shell --params '{"command": "python -m pytest"}'
输出示例
正在创建沙箱...
  仓库: https://github.com/user/repo
  Commit: abc123
  镜像: python:3.11-slim
✓ 沙箱创建成功: sandbox-a1b2c3
  工作目录: /workspace
  状态: running

执行工具: shell
  命令: python -m pytest
  Exit code: 0
  Output:
    ===== 42 passed, 3 failed =====
# 重置沙箱到初始状态
knowlyr-sandbox reset <sandbox_id>

# 重放执行轨迹
knowlyr-sandbox replay <sandbox_id> trajectory.json

# 列出活跃沙箱
knowlyr-sandbox list
输出示例
活跃沙箱列表:
  ID              状态      镜像                  创建时间
  sandbox-a1b2c3  running   python:3.11-slim     2025-01-15 10:30
  sandbox-d4e5f6  paused    node:18-slim         2025-01-15 11:45
总计: 2 个沙箱

轨迹重放 / Trajectory Replay

轨迹重放是 AgentSandbox 的核心能力之一,支持将 Agent 的执行过程完整回放:

from agentsandbox.replay import replay_trajectory, Trajectory

# 从文件加载轨迹
trajectory = Trajectory.from_dict({
    "steps": [
        {"tool_name": "file_read", "params": {"path": "src/main.py"}},
        {"tool_name": "file_write", "params": {"path": "src/main.py", "content": "..."}},
        {"tool_name": "shell", "params": {"command": "pytest"}},
    ],
    "metadata": {"agent": "claude", "model": "claude-opus-4-20250514"}
})

# 重放
result = replay_trajectory(sandbox, trajectory)
print(f"成功: {result.success}")
print(f"偏离步骤: {result.divergence_step}")

沙箱快照 / Snapshot

# 在任意时刻创建快照
snapshot_id = sandbox.snapshot()

# 重置到初始状态
sandbox.reset()

MCP Server / Claude Integration

在 Claude Desktop / Claude Code 中直接使用。

配置 / Config

添加到 ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "knowlyr-sandbox": {
      "command": "uv",
      "args": ["--directory", "/path/to/agent-sandbox", "run", "python", "-m", "agentsandbox.mcp_server"]
    }
  }
}

可用工具 / Tools

工具 功能
create_sandbox 创建 Docker 沙箱执行环境
execute_tool 在沙箱中执行工具 (5 种标准工具)
reset_sandbox 重置沙箱到初始状态
replay_trajectory 重放 Agent 执行轨迹

使用示例 / Usage Example

用户: 帮我在 https://github.com/user/repo 的 abc123 上创建沙箱并运行测试

Claude: [调用 create_sandbox]
        沙箱已创建: sandbox-xyz

        [调用 execute_tool: shell "pytest tests/"]
        测试结果:
        - 通过: 42
        - 失败: 3
        - 错误: 0

Data Pipeline 生态 / Ecosystem

AgentSandbox 是 Knowlyr 生态的执行环境组件:

graph LR
    Radar["🔍 Radar<br/>情报发现"] --> Recipe["📋 Recipe<br/>逆向分析"]
    Recipe --> Synth["🔄 Synth<br/>数据合成"]
    Recipe --> Label["🏷️ Label<br/>数据标注"]
    Synth --> Check["✅ Check<br/>数据质检"]
    Label --> Check
    Check --> Audit["🔬 Audit<br/>模型审计"]
    Audit --> Hub["🎯 Hub<br/>编排层"]
    Hub --> Sandbox["📦 Sandbox<br/>执行沙箱"]
    Sandbox --> Recorder["📹 Recorder<br/>轨迹录制"]
    Recorder --> Reward["⭐ Reward<br/>过程打分"]
    style Sandbox fill:#0969da,color:#fff,stroke:#0969da

生态项目

项目 说明 仓库
情报 AI Dataset Radar 数据集竞争情报、趋势分析 GitHub
分析 DataRecipe 逆向分析、Schema 提取、成本估算 GitHub
生产 DataSynth LLM 批量合成、种子数据扩充 GitHub
生产 DataLabel 轻量标注工具、多标注员合并 GitHub
质检 DataCheck 规则验证、重复检测、分布分析 GitHub
质检 ModelAudit 蒸馏检测、模型指纹、身份验证 GitHub
Agent AgentSandbox Docker 执行沙箱、轨迹重放 You are here
Agent AgentRecorder 标准化轨迹录制、多框架适配 GitHub
Agent AgentReward 过程级 Reward、Rubric 多维评估 GitHub
编排 TrajectoryHub Pipeline 编排、数据集导出 GitHub

端到端工作流 / End-to-end Flow

# 1. Radar: 发现高价值数据集
knowlyr-radar scan --topic "code-generation"

# 2. DataRecipe: 分析数据集,生成 Schema 和样例
knowlyr-datarecipe deep-analyze tencent/CL-bench -o ./output

# 3. DataSynth: 基于种子数据批量合成
knowlyr-datasynth generate ./output/tencent_CL-bench/ -n 1000

# 4. DataLabel: 生成标注界面,人工标注/校准
knowlyr-datalabel generate ./output/tencent_CL-bench/

# 5. DataCheck: 质量检查
knowlyr-datacheck validate ./output/tencent_CL-bench/

# 6. AgentSandbox: 在沙箱中执行 Code Agent 任务
knowlyr-sandbox create --repo https://github.com/user/repo --commit abc123

# 7. AgentRecorder: 录制 Agent 执行轨迹
knowlyr-recorder record <sandbox_id> -o trajectory.json

# 8. AgentReward: 对轨迹进行过程级打分
knowlyr-reward score trajectory.json --rubric rubric.yaml

# 9. TrajectoryHub: 编排完整流水线
knowlyr-hub run pipeline.yaml

Agent 层 MCP 配置 / Agent Layer MCP Config

{
  "mcpServers": {
    "knowlyr-sandbox": {
      "command": "uv",
      "args": ["--directory", "/path/to/agent-sandbox", "run", "python", "-m", "agentsandbox.mcp_server"]
    },
    "knowlyr-recorder": {
      "command": "uv",
      "args": ["--directory", "/path/to/agent-recorder", "run", "python", "-m", "agentrecorder.mcp_server"]
    },
    "knowlyr-reward": {
      "command": "uv",
      "args": ["--directory", "/path/to/agent-reward", "run", "python", "-m", "agentreward.mcp_server"]
    }
  }
}

命令参考

命令 功能
knowlyr-sandbox create 创建沙箱环境
knowlyr-sandbox exec <id> 在沙箱中执行工具
knowlyr-sandbox reset <id> 重置沙箱到初始状态
knowlyr-sandbox replay <id> <file> 重放 Agent 执行轨迹
knowlyr-sandbox list 列出活跃沙箱

create 选项

选项 说明 默认值
--repo Git 仓库 URL (必填)
--commit 起始 commit SHA (必填)
--language 编程语言 python
--image Docker 镜像 python:3.11-slim
--timeout 超时 (秒) 300
--memory 内存限制 512m
--cpu CPU 限制 1.0

API 使用

from agentsandbox import Sandbox, SandboxConfig
from agentsandbox.config import TaskConfig

# 配置
config = SandboxConfig(
    image="python:3.11-slim",
    timeout=300,
    memory_limit="512m",
)

task = TaskConfig(
    repo_url="https://github.com/user/repo",
    base_commit="abc123",
    test_command="pytest tests/",
)

# 创建沙箱
sandbox = Sandbox.create(config, task)

# 执行工具
result = sandbox.execute_tool("shell", {"command": "python -m pytest"})
print(f"Exit code: {result.exit_code}")
print(f"Output: {result.output}")

# 快照和重置
snapshot_id = sandbox.snapshot()
sandbox.reset()

# 清理
sandbox.close()

SandboxConfig

属性 类型 默认值 说明
image str python:3.11-slim Docker 镜像
timeout int 300 超时 (秒)
memory_limit str 512m 内存限制
cpu_limit float 1.0 CPU 限制
work_dir str /workspace 工作目录
network_enabled bool False 网络访问

TaskConfig

属性 类型 默认值 说明
repo_url str "" Git 仓库 URL
base_commit str "" 起始 commit
test_command str "" 测试命令
language str python 编程语言
setup_commands list [] 初始化命令

ToolResult

属性 类型 说明
output str 标准输出
exit_code int 退出码
error str | None 错误信息
success bool 是否成功 (属性)

项目架构

src/agentsandbox/
├── config.py       # 沙箱和任务配置
├── sandbox.py      # 核心沙箱 (Docker 管理)
├── tools.py        # 标准工具接口 (5 种工具)
├── replay.py       # 轨迹重放
├── cli.py          # CLI 命令行
└── mcp_server.py   # MCP Server (4 工具)

License

MIT


AI Data Pipeline 生态

10 个工具覆盖 AI 数据工程全流程,均支持 CLI + MCP,可独立使用也可组合成流水线。

项目 说明 仓库
情报 AI Dataset Radar 数据集竞争情报、趋势分析 GitHub
分析 DataRecipe 逆向分析、Schema 提取、成本估算 GitHub
生产 DataSynth LLM 批量合成、种子数据扩充 GitHub
生产 DataLabel 轻量标注工具、多标注员合并 GitHub
质检 DataCheck 规则验证、重复检测、分布分析 GitHub
质检 ModelAudit 蒸馏检测、模型指纹、身份验证 GitHub
Agent AgentSandbox Docker 执行沙箱、轨迹重放 You are here
Agent AgentRecorder 标准化轨迹录制、多框架适配 GitHub
Agent AgentReward 过程级 Reward、Rubric 多维评估 GitHub
编排 TrajectoryHub Pipeline 编排、数据集导出 GitHub
graph LR
    A[Radar] --> B[Recipe] --> C[Synth] --> E[Check] --> F[Audit] --> G[Hub]
    B --> D[Label] --> E
    G --> H[Sandbox] --> I[Recorder] --> J[Reward]

为 Code Agent 提供安全、可复现的执行环境

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowlyr_sandbox-0.1.2.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowlyr_sandbox-0.1.2-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file knowlyr_sandbox-0.1.2.tar.gz.

File metadata

  • Download URL: knowlyr_sandbox-0.1.2.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for knowlyr_sandbox-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8d2eee57853e2b30fa4fc2e73b6df3ed5662914f5fc8a8600e61917fb929f890
MD5 7cd607f011b6272b6afd8f4d78b70942
BLAKE2b-256 8a002ecdcf16470cc1c6b98e534df9cf7254afb0b6c3baf97261bfd1de647346

See more details on using hashes here.

File details

Details for the file knowlyr_sandbox-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: knowlyr_sandbox-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for knowlyr_sandbox-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b426b7d55f227e96cd75eab6f038524d1823c05390332a87737f926cca391e2b
MD5 82c3f01b17e179c141217ca04417445b
BLAKE2b-256 dac5086d73614d1b2f1217eb441d382c2ac080239fdcd03b6e80338de5e0bc09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page