Skip to main content

A local-first session-based sandbox runtime for AI agents.

Project description

session-based-sandbox

Local-first sandbox runtime: one HTTP session maps to one temp workdir, run bash or Python steps with timeouts, explicit sandbox_id routing, and DELETE to tear down.

Contributing: see CONTRIBUTING.md. License: MIT. Releases: see CHANGELOG.md. Detailed MVP design for implementers: Implementation specification below.

Quickstart

Requires Python 3.11+.

git clone <repository-url>
cd session-based-sandbox
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -U pip setuptools wheel
pip install -e ".[dev]"

sbs run
# same server: session-based-sandbox run

Then open http://127.0.0.1:8000/docs for interactive API documentation.

When the package is published to PyPI:

pip install session-based-sandbox
sbs run

HTTP API (current behavior)

Default base URL: http://127.0.0.1:8000.

POST /sessions

Creates a session and isolated working directory. Body: empty JSON object {} is fine.

Response: {"session_id": "<uuid>"}

POST /sessions/{session_id}/step

JSON body (required fields):

Field Description
sandbox_id Must equal session_id from the URL (explicit routing).
type "bash" or "python".
payload Bash: {"cmd": "<shell string>"}. Python: {"code": "<source passed to python -c>"}.

Response: {"output": "...", "error": "...", "exit_code": <int>}
If the step exceeds the configured wall-clock limit, exit_code is 124 and error describes the timeout.

DELETE /sessions/{session_id}

204 with no body. Removes the session and its workdir. Later steps for that id return 404.

curl smoke (single-line friendly)

After sbs run, in another terminal:

BASE=http://127.0.0.1:8000 && SID=$(curl -sS -X POST "$BASE/sessions" | python3 -c "import sys,json; print(json.load(sys.stdin)['session_id'])") && curl -sS -X POST "$BASE/sessions/$SID/step" -H 'Content-Type: application/json' -d "{\"sandbox_id\":\"$SID\",\"type\":\"bash\",\"payload\":{\"cmd\":\"pwd\"}}" && echo && curl -sS -o /dev/null -w "DELETE %{http_code}\n" -X DELETE "$BASE/sessions/$SID"

CLI

Both console scripts call the same uvicorn app:

Entry point Notes
sbs run Short alias.
session-based-sandbox run Same behavior as sbs.

Options: --host (default 127.0.0.1), --port (default 8000).

sbs run --help
session-based-sandbox run --host 127.0.0.1 --port 8001

Configuration

Variable Meaning
SBS_STEP_TIMEOUT_SEC Max seconds per step (default 30, minimum 1). Read when the server process starts; restart after changing.

Safety

  • Isolation is temp directories + subprocesses, not containers or VMs. Host resources (CPU, disk, network) can still be affected by malicious or heavy workloads.
  • Steps execute as the same OS user as the server, using host Python and /bin/bash (where available).
  • Timeouts and DELETE attempt to terminate the child process; behavior under concurrent load is best-effort for this MVP.
  • There is no authentication. Prefer binding to 127.0.0.1 and do not expose the API to untrusted networks without a separate auth layer.

Development setup

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Repository scripts used in release checks:

bash scripts/verify_editable_install.sh
bash scripts/verify_cli_entrypoints.sh
bash scripts/verify_package_build.sh
bash scripts/verify_release_ready.sh

Testing

pytest tests/
pytest tests/ -v
pytest tests/ --cov=session_based_sandbox --cov-report=term-missing
pytest tests/system/test_cli_server_entrypoints.py -v

On GitHub, CI runs the same test suite on Python 3.11 and 3.12 for pushes/PRs to main or master (see .github/workflows/ci.yml).


Implementation specification (MVP design)

Project Goal

Build a production-quality MVP for a Session-Based Sandbox Runtime using Python + FastAPI.

This is a local-first execution runtime for AI agents that provides:

  • session-based execution
  • stateful sandbox environments
  • isolated runtime environments
  • explicit execution routing
  • deterministic behavior

This is NOT a full platform.

This is a minimal, robust, extensible open-source tool.

The goal is to let users or AI agents safely execute coding tasks, shell commands, and workflows inside isolated local sandboxes without polluting the host machine.


Product Context

User Scenario

A user wants an AI agent to help with:

  • coding
  • running scripts
  • installing packages
  • debugging
  • data analysis
  • executing terminal commands

But they do NOT want the agent directly operating inside the host machine environment.

They want a safer, simpler, more controllable local runtime.


User Pain Points

Pain Point 1

The user has no technical background.

They do not know what a sandbox is, but they want their AI agent to code for them.


Pain Point 2

The user has technical background, but existing tools are too complex to configure.

They do not want to spend hours configuring Docker / infra / orchestration.


Pain Point 3

The user has technical background, but they do not want agents running dangerous commands directly on their machine.

They want strong isolation and cleanup.


Pain Point 4

The user notices the agent repeatedly makes the same execution mistakes.

For example:

  • re-running failed scripts
  • repeating broken environment setup
  • retrying commands that already failed

They want session-based state and future persistent memory support.


What SBS Solves

sbs (session-based-sandbox) solves this by providing:

1. Easy Installation

Users can install via common package managers:

pip install session-based-sandbox

and later potentially:

npm install ...
brew install ...

(Phase 1 only requires Python packaging.)


2. Agent-Friendly Usage

Tools like:

  • ClaudeHub
  • Hermes
  • OpenHands
  • other coding agents

can read the SBS skill documentation and learn how to use it.

This makes SBS usable by both:

  • humans
  • AI agents

3. Safe Local Runtime

Users and agents can:

  • create isolated sessions
  • execute python/bash
  • preserve state inside the session
  • destroy the environment after completion

without polluting the host machine.


Core Design Principle

1 Session → 1 Sandbox

This is the most important architecture rule.

Correct Model

1 Session → 1 Sandbox

Meaning:

A single session owns exactly one sandbox.


Definitions

Session

Logical task lifecycle manager.

Responsible for:

  • lifecycle management
  • state management
  • step routing
  • execution history
  • cleanup trigger

Think of it as:

task controller

Sandbox

Actual execution environment.

Responsible for:

  • command execution
  • file isolation
  • subprocess management
  • cwd management
  • runtime isolation

Think of it as:

the actual worker machine

Important Rule

Session ≠ Sandbox

But:

Session owns exactly one Sandbox

which means:

1 Session → 1 Sandbox

Why This Rule Exists

Because shared environments cause chaos.

Bad example:

Session A:
pip install pandas==1.5

Session B:
pip install pandas==2.2

Result:

everything breaks

No determinism.

No safety.

No traceability.

No cleanup.


Benefits

Strong Isolation

Sessions do not affect each other.


Deterministic Behavior

Each task runs inside its own isolated environment.


Easy Cleanup

DELETE /sessions/{id}

removes the entire environment.


Stateful Execution

Same session can continue previous work.

Example:

yesterday installed packages
today still available

This is not stateless command execution.

This is stateful runtime execution.


MVP Scope

Build ONLY the minimum production-quality MVP.

Do NOT build a platform.

Do NOT over-engineer.

Do NOT add future features.


Tech Stack

Use:

  • Python 3.11+
  • FastAPI
  • Uvicorn
  • Pydantic
  • Pytest

Optional:

  • asyncio
  • subprocess
  • tempfile
  • pathlib
  • uuid
  • signal
  • shutil
  • logging

Do NOT use:

  • Docker
  • Celery
  • Redis
  • PostgreSQL
  • SQLAlchemy
  • Kubernetes
  • RabbitMQ
  • external infra

Everything must run fully on localhost.


Required Features

Implement ONLY the following.


1. Session Lifecycle

Create Session

Endpoint

POST /sessions

Behavior

Must:

  • create a new session
  • generate unique session_id
  • create exactly one local sandbox
  • create isolated working directory using tempfile.mkdtemp()
  • set session status = ACTIVE

Return

{
  "session_id": "uuid"
}

2. Step Execution

Endpoint

POST /sessions/{session_id}/step

Required Step Schema

Every request MUST include:

{
  "sandbox_id": "session_id",
  "type": "python | bash",
  "payload": {}
}

Validation Rules

Must enforce:

  • sandbox_id is mandatory
  • sandbox_id MUST equal session_id
  • otherwise return validation error

No implicit routing allowed.

Explicit execution target only.


Supported Step Types

Python

{
  "sandbox_id": "session_id",
  "type": "python",
  "payload": {
    "code": "print(123)"
  }
}

Bash

{
  "sandbox_id": "session_id",
  "type": "bash",
  "payload": {
    "cmd": "ls -la"
  }
}

Execution Requirements

Execution must:

  • run inside that session’s isolated cwd
  • capture stdout
  • capture stderr
  • capture exit_code
  • enforce timeout

If timeout occurs:

  • kill process
  • return timeout error clearly

Return

{
  "output": "...",
  "error": "...",
  "exit_code": 0
}

3. Close Session

Endpoint

DELETE /sessions/{session_id}

Behavior

Must:

  • mark session CLOSED
  • terminate alive subprocesses
  • delete temp working directory
  • block future execution for this session

Failure Modes (Must Handle)

Must explicitly handle:


Sandbox Crash

Return structured execution error.


Infinite Loop

Use timeout + force kill.


Closed Session

Execution must be blocked.


Sandbox Isolation

No cross-session shared state.


Resource Cleanup

Must destroy resources after close.

No:

  • orphan subprocesses
  • leaked temp directories

Project Structure

Use exactly this structure:

session-based-sandbox/
│
├── session_based_sandbox/
│   ├── cli.py
│   ├── server.py
│   │
│   ├── runtime/
│   │   ├── runtime.py
│   │   ├── session.py
│   │   ├── executor.py
│   │   ├── router.py
│   │   └── state.py
│   │
│   ├── sandbox/
│   │   └── local.py
│   │
│   └── api/
│       ├── http.py
│       └── ws.py
│
├── tests/
│   ├── unit/
│   ├── integration/
│   ├── system/
│   └── failure_modes/
│
└── pyproject.toml

Logging

Use simple structured logs for:

  • session_created
  • step_received
  • step_started
  • step_finished
  • execution_failed
  • session_closed

Requirements:

  • keep logging simple
  • standard logging only
  • no tracing system

Testing (Required)

Write real pytest tests.

No placeholder tests.

Tests must actually run.


Required Coverage

Must test:

  • session lifecycle
  • step routing correctness
  • sandbox isolation
  • timeout handling
  • crash handling
  • closed session execution blocked

Installation Requirements

Must support:

pip install -e .

and

pip install session-based-sandbox

Must work as:

  • local editable install
  • normal published package install

CLI Requirements

Must expose both commands:

session-based-sandbox run

and

sbs run

Both must start the same FastAPI server.

Default server:

http://127.0.0.1:8000

pyproject.toml Entry Points

Must define:

[project.scripts]
session-based-sandbox = "session_based_sandbox.cli:run"
sbs = "session_based_sandbox.cli:run"

No wrappers.

No extra launch layers.

Simple and explicit only.


Strong Constraints

Do NOT implement:

  • Docker sandbox
  • WebSocket streaming
  • persistent storage
  • distributed workers
  • tracing UI
  • SDK
  • auth system
  • user system
  • database
  • queue system
  • scheduler
  • background workers

These are future features.

They must be excluded.


Code Quality Rules

Code must be:

  • clean
  • typed
  • readable
  • maintainable
  • minimal
  • testable

Avoid:

  • giant files
  • hidden magic
  • unnecessary inheritance
  • speculative abstractions

Prefer:

  • explicit code
  • small modules
  • simple control flow

Deliverables

Must produce:

  1. Full project code
  2. All required tests
  3. pyproject.toml
  4. CLI runnable entrypoint
  5. Proper package metadata for publishable installation

Must support:

pip install -e .
pip install session-based-sandbox

session-based-sandbox run
sbs run

Server must run at:

http://127.0.0.1:8000

Recommended Development Order

Build in this order:

1. Create project structure
2. pyproject.toml
3. cli.py
4. server.py
5. api/http.py skeleton
6. runtime/state.py
7. runtime/session.py
8. sandbox/local.py
9. runtime/executor.py
10. runtime/router.py
11. runtime/runtime.py
12. tests
13. local install validation
14. CLI validation
15. pytest validation

Final Requirement

This is the most important rule:

Build the MVP exactly.

Do not improve scope.

Do not add platform features.

Do not redesign architecture.

Strictly execute the specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

session_based_sandbox-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

session_based_sandbox-0.1.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file session_based_sandbox-0.1.0.tar.gz.

File metadata

  • Download URL: session_based_sandbox-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for session_based_sandbox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2b91b7e3b423d54e6fc7b8c9a3579a6428a03be329a33553b7e92bb29851a116
MD5 850cd5a197b301c9d87943f643744b07
BLAKE2b-256 2a92abb0a62e739c39409c9fcaf58271c149f487d65386d7c36edf2c686b80b1

See more details on using hashes here.

File details

Details for the file session_based_sandbox-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for session_based_sandbox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b655c59fa9b2a37b3edeffb84aa6660bb58d8d003330aa770b22f1eb45fb065c
MD5 4e36470f359627a48e6b0be589465229
BLAKE2b-256 043d35e1fa45065193e77701d52fdbb84a7424c2056a23f47c183cfc86549ca5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page