Skip to main content

Production-grade security sandbox for executing untrusted Python & JavaScript code generated by LLMs using WebAssembly and WASI

Project description

๐Ÿ”’ LLM WASM Sandbox

Production-grade security sandbox for executing untrusted Python & JavaScript code generated by LLMs

Python 3.11+ License: MIT Wasmtime

Execute untrusted code safely using WebAssembly sandboxing with multi-layered security:

๐Ÿ” WASM Memory Safety - Bounds-checked execution
๐Ÿ›ก๏ธ WASI Capability-Based I/O - Filesystem isolation
โฑ๏ธ Deterministic Limits - Fuel metering & resource caps
๐Ÿ Python & JavaScript - CPython WASM + QuickJS runtimes


๐Ÿ“‹ Table of Contents


โœจ Features

  • ๐Ÿ”’ Production-Grade Security: Multi-layered defense with WASM memory safety, WASI capabilities, and resource limits
  • ๐Ÿ Python Runtime: CPython 3.11 compiled to WASM via WebAssembly Language Runtimes
  • ๐Ÿ“œ JavaScript Runtime: QuickJS-NG WASM for secure JavaScript execution
  • ๐Ÿ“ฆ Bundled Runtimes: WASM binaries included in package - no separate downloads needed
  • โšก Deterministic Execution: Fuel-based instruction counting prevents runaway code
  • ๐Ÿ“ฆ Package Vendoring: Pure-Python packages available in sandbox via vendor/ directory
  • ๐Ÿ’พ Persistent Sessions: UUID-based session IDs with automatic workspace isolation
  • ๐Ÿ—‚๏ธ Pluggable Storage: Storage adapter interface with disk and custom backend support
  • ๐Ÿ“Š Rich Metrics: Fuel consumption, memory usage, execution time tracking
  • ๐ŸŽฏ Type-Safe API: Pydantic models for policies and results
  • ๐Ÿ” Structured Logging: Observable execution events for monitoring
  • ๐Ÿงน Session Pruning: Automatic cleanup of old sessions with configurable retention policies
  • ๐Ÿ’ก Actionable Error Guidance: Structured error analysis with concrete solutions for common failures
  • ๐Ÿ“ˆ Proactive Fuel Analysis: Automatic budget monitoring with recommendations to prevent OutOfFuel errors

LLM WASM Sandbox Architecture


๐Ÿš€ Quick Start

Prerequisites

  • Python 3.11+ (Python 3.13+ recommended)
  • uv package manager (recommended) or pip
  • Windows, macOS, or Linux

Installation

From PyPI (Recommended)

# Install the package (includes WASM runtimes)
pip install llm-wasm-sandbox

# That's it! The WASM runtimes (python.wasm and quickjs.wasm) are bundled automatically.

From Source

# Clone the repository
git clone https://github.com/ThomasRohde/llm-wasm-sandbox.git
cd llm-wasm-sandbox

# Install dependencies (uv recommended)
uv sync
# OR with pip
pip install -r requirements.txt

# Download WASM runtimes (required for development)
.\scripts\fetch_wlr_python.ps1   # CPython WASM binary
.\scripts\fetch_quickjs.ps1       # QuickJS WASM binary

Hello World

Python Runtime:

from sandbox import create_sandbox, RuntimeType

sandbox = create_sandbox(runtime=RuntimeType.PYTHON)
result = sandbox.execute("print('Hello from WASM!')")
print(result.stdout)  # "Hello from WASM!"

JavaScript Runtime:

from sandbox import create_sandbox, RuntimeType

sandbox = create_sandbox(runtime=RuntimeType.JAVASCRIPT)
result = sandbox.execute("console.log('Hello from QuickJS!')")
print(result.stdout)  # "Hello from QuickJS!"

Stateful Sessions with Automatic Variable Persistence (Python only):

from sandbox import create_sandbox, RuntimeType

# Create session with auto-persist enabled (Python runtime only)
sandbox = create_sandbox(runtime=RuntimeType.PYTHON, auto_persist_globals=True)

# First execution - set variables
sandbox.execute("counter = 100; data = [1, 2, 3]")

# Second execution - variables automatically restored!
sandbox.execute("print(f'counter={counter}, data={data}')")
# Output: counter=100, data=[1, 2, 3]

# โœ… JavaScript runtime also supports auto_persist_globals!
# See examples/demo_javascript_stateful.py for JavaScript examples.

Run Demo

# Python demo with comprehensive examples
uv run python examples/demo.py

# JavaScript demo (single execution)
uv run python examples/demo_javascript.py

# JavaScript session demo (stateful execution)
uv run python examples/demo_javascript_session.py

# Session workflow demo (file operations)
uv run python examples/demo_session_workflow.py

๐Ÿ—๏ธ Architecture

Project Structure

llm-wasm-sandbox/
โ”œโ”€โ”€ bin/
โ”‚   โ”œโ”€โ”€ python.wasm               # CPython WASM binary (bundled with package)
โ”‚   โ””โ”€โ”€ quickjs.wasm              # QuickJS WASM binary (bundled with package)
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ policy.toml               # Execution policy configuration
โ”œโ”€โ”€ sandbox/
โ”‚   โ”œโ”€โ”€ core/                     # Type-safe foundation
โ”‚   โ”‚   โ”œโ”€โ”€ models.py             # ExecutionPolicy, SandboxResult, RuntimeType
โ”‚   โ”‚   โ”œโ”€โ”€ base.py               # BaseSandbox ABC
โ”‚   โ”‚   โ”œโ”€โ”€ errors.py             # Custom exceptions
โ”‚   โ”‚   โ”œโ”€โ”€ logging.py            # Structured logging
โ”‚   โ”‚   โ”œโ”€โ”€ factory.py            # create_sandbox() factory
โ”‚   โ”‚   โ””โ”€โ”€ storage.py            # Storage adapter interface
โ”‚   โ”œโ”€โ”€ runtimes/                 # Runtime implementations
โ”‚   โ”‚   โ”œโ”€โ”€ python/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ sandbox.py        # PythonSandbox
โ”‚   โ”‚   โ””โ”€โ”€ javascript/
โ”‚   โ”‚       โ””โ”€โ”€ sandbox.py        # JavaScriptSandbox
โ”‚   โ”œโ”€โ”€ host.py                   # Wasmtime/WASI wrapper
โ”‚   โ”œโ”€โ”€ policies.py               # Policy loading
โ”‚   โ”œโ”€โ”€ sessions.py               # Session file operations & pruning
โ”‚   โ”œโ”€โ”€ utils.py                  # Utilities
โ”‚   โ””โ”€โ”€ vendor.py                 # Package vendoring
โ”œโ”€โ”€ mcp_server/                  # MCP server implementation
โ”‚   โ”œโ”€โ”€ server.py                 # FastMCP server
โ”‚   โ”œโ”€โ”€ sessions.py               # Workspace session manager
โ”‚   โ”œโ”€โ”€ tools.py                  # MCP tool definitions
โ”‚   โ”œโ”€โ”€ transports.py             # Stdio/HTTP transports
โ”‚   โ””โ”€โ”€ config.py                 # MCP configuration
โ”œโ”€โ”€ workspace/                   # Isolated filesystem (mounted as /app)
โ”‚   โ””โ”€โ”€ <session-id>/            # Per-session workspaces
โ”œโ”€โ”€ vendor/                      # Vendored pure-Python packages
โ”‚   โ””โ”€โ”€ site-packages/
โ”œโ”€โ”€ examples/                    # Demo scripts and examples
โ”‚   โ”œโ”€โ”€ demo.py                  # Comprehensive feature demo
โ”‚   โ”œโ”€โ”€ demo_javascript.py       # JavaScript runtime demo
โ”‚   โ”œโ”€โ”€ demo_javascript_session.py # JavaScript session demo
โ”‚   โ”œโ”€โ”€ demo_session_workflow.py # Session workflow demo
โ”‚   โ””โ”€โ”€ openai_agents/           # OpenAI Agents SDK integrations
โ”œโ”€โ”€ pyproject.toml               # Project metadata & dependencies
โ””โ”€โ”€ README.md                    # This file

๐Ÿค– LLM Integration

Integration Flow

Typical usage in an LLM code generation pipeline:

graph LR
    A[LLM generates code] --> B[Create sandbox]
    B --> C[Execute in WASM]
    C --> D[Collect metrics]
    D --> E{Success?}
    E -->|Yes| F[Return results]
    E -->|No| G[Send feedback to LLM]
    G --> A

Example Integration

from sandbox import create_sandbox, ExecutionPolicy, RuntimeType

def execute_llm_code(llm_generated_code: str) -> dict:
    """Execute LLM-generated code with safety boundaries."""
    
    # Configure conservative limits for LLM code
    policy = ExecutionPolicy(
        fuel_budget=500_000_000,      # Fail fast on complex code
        memory_bytes=32 * 1024 * 1024,  # 32 MB limit
        stdout_max_bytes=100_000        # 100 KB output
    )
    
    sandbox = create_sandbox(runtime=RuntimeType.PYTHON, policy=policy)
    result = sandbox.execute(llm_generated_code)
    
    # Use structured error guidance for actionable feedback
    if not result.success:
        error_guidance = result.metadata.get("error_guidance", {})
        return {
            "status": "error",
            "feedback": f"Execution failed: {result.stderr}",
            "error_type": error_guidance.get("error_type", "Unknown"),
            "solutions": error_guidance.get("actionable_guidance", []),
            "related_docs": error_guidance.get("related_docs", [])
        }
    
    # Use fuel analysis for proactive budget recommendations
    fuel_analysis = result.metadata.get("fuel_analysis", {})
    if fuel_analysis.get("status") in ("warning", "critical"):
        return {
            "status": "warning",
            "feedback": fuel_analysis.get("recommendation", "Code complexity high"),
            "fuel_used": result.fuel_consumed,
            "suggested_budget": fuel_analysis.get("suggested_budget")
        }
    
    return {
        "status": "success",
        "output": result.stdout,
        "metrics": {
            "fuel": result.fuel_consumed,
            "duration": result.duration_seconds,
            "memory_pages": result.mem_pages
        }
    }

# Use in LLM loop
code = generate_code_from_llm("Calculate fibonacci(10)")
feedback = execute_llm_code(code)
print(feedback)

Error Guidance & Fuel Analysis

The sandbox automatically provides structured error analysis and fuel budget recommendations:

Error Guidance (result.metadata["error_guidance"]):

# When execution fails, get actionable solutions
result = sandbox.execute("import nonexistent_package")

if "error_guidance" in result.metadata:
    guidance = result.metadata["error_guidance"]
    print(f"Error Type: {guidance['error_type']}")  # e.g., "MissingVendoredPackage"
    for solution in guidance["actionable_guidance"]:
        print(f"  - {solution}")
    # Output: 
    #   - Add: import sys; sys.path.insert(0, '/data/site-packages')
    #   - Then import vendored packages normally

Fuel Analysis (result.metadata["fuel_analysis"]):

# Monitor fuel usage to prevent OutOfFuel errors
result = sandbox.execute("import openpyxl; wb = openpyxl.Workbook()")

analysis = result.metadata.get("fuel_analysis", {})
print(f"Status: {analysis['status']}")           # efficient/moderate/warning/critical
print(f"Utilization: {analysis['utilization_percent']:.1f}%")
print(f"Recommendation: {analysis['recommendation']}")
# Output:
#   Status: warning
#   Utilization: 82.5%
#   Recommendation: Consider increasing fuel budget to 15B for similar workloads

For detailed error types and fuel planning, see:

Multi-Turn Sessions

Option 1: Automatic Global Variable Persistence (Recommended)

from sandbox import create_sandbox, RuntimeType

# Create session with auto-persist enabled
sandbox = create_sandbox(runtime=RuntimeType.PYTHON, auto_persist_globals=True)

# Turn 1: LLM sets variables (no manual save/load needed!)
result1 = sandbox.execute("""
users = ["Alice", "Bob"]
count = len(users)
print(f"Initialized: {users}")
""")

# Turn 2: Variables automatically restored
result2 = sandbox.execute("""
users.append("Charlie")
count = len(users)
print(f"Updated: {users}, count={count}")
""")
# Output: Updated: ['Alice', 'Bob', 'Charlie'], count=3

Option 2: Manual File Persistence

For complex data structures or explicit control:

from sandbox import create_sandbox, RuntimeType

# Turn 1: LLM creates data file
sandbox = create_sandbox(runtime=RuntimeType.PYTHON)
session_id = sandbox.session_id

result1 = sandbox.execute("""
import json
data = {"users": ["Alice", "Bob"], "count": 2}
with open('/app/data.json', 'w') as f:
    json.dump(data, f)
print("Data saved")
""")

# Turn 2: LLM reads and processes data (same session)
sandbox = create_sandbox(session_id=session_id, runtime=RuntimeType.PYTHON)

result2 = sandbox.execute("""
import json
with open('/app/data.json', 'r') as f:
    data = json.load(f)
data['users'].append('Charlie')
data['count'] = len(data['users'])
print(f"Updated: {data}")
""")

print(result2.stdout)  # "Updated: {'users': ['Alice', 'Bob', 'Charlie'], 'count': 3}"

Session workspaces are canonicalized before creation (no /, \\, or .. in IDs) and you can enforce UUID-only IDs via allow_non_uuid=False. Vendored packages are mounted read-only at /data/site-packages (shared across all sessions for efficiency), optional mount_data_dir mounts are also read-only, and host-side logs are cleaned up unless you opt in with ExecutionPolicy(preserve_logs=True).

Session Management API

Direct session file operations and pruning:

from sandbox import (
    create_sandbox, RuntimeType,
    write_session_file, read_session_file, list_session_files,
    prune_sessions, delete_session_workspace
)

# Create sandbox and write files
sandbox = create_sandbox(runtime=RuntimeType.PYTHON)
write_session_file(sandbox.session_id, "data.json", '{"key": "value"}')

# List all files in session
files = list_session_files(sandbox.session_id)
print(files)  # ['data.json', 'user_code.py']

# Read file content
content = read_session_file(sandbox.session_id, "data.json")

# Prune old sessions (e.g., older than 7 days)
result = prune_sessions(max_age_days=7)
print(f"Deleted {result.deleted_count} sessions, freed {result.bytes_freed} bytes")

# Delete specific session
delete_session_workspace(sandbox.session_id)

Pluggable Storage Adapters

Customize storage backend for sessions:

from sandbox import create_sandbox, RuntimeType, StorageAdapter
from pathlib import Path

class CustomStorage(StorageAdapter):
    """Custom storage backend (e.g., S3, Azure Blob)."""
    
    def read(self, path: Path) -> bytes:
        # Implement custom read logic
        pass
    
    def write(self, path: Path, content: bytes) -> None:
        # Implement custom write logic
        pass
    
    def delete(self, path: Path) -> None:
        # Implement custom delete logic
        pass
    
    def exists(self, path: Path) -> bool:
        # Implement custom exists check
        pass

# Use custom storage
storage = CustomStorage()
sandbox = create_sandbox(
    runtime=RuntimeType.PYTHON,
    storage_adapter=storage
)

Agent Integration Examples

For production LLM agent integrations, see the OpenAI Agents SDK integration examples:

Basic Agent (examples/openai_agents/basic_agent.py):

  • Function calling tools for Python/JavaScript execution
  • Structured result handling with security metrics
  • Conservative execution policies for untrusted code

Stateful Agent (examples/openai_agents/stateful_agent.py):

  • Session-based multi-turn conversations
  • File persistence across agent interactions
  • Automatic error recovery and debugging

See the OpenAI Agents integration README for setup instructions and detailed usage patterns.


๐Ÿ”— MCP Integration

The sandbox now includes Model Context Protocol (MCP) server support for standardized tool use in AI applications. MCP provides a consistent interface for LLM clients like Claude Desktop to securely execute code.

Quick MCP Start

Install from PyPI:

# Install package with bundled WASM runtimes
pip install llm-wasm-sandbox

# Start MCP server (runtimes included)
python -m mcp_server

# Or use uvx for one-off execution (no installation needed)
uvx --from llm-wasm-sandbox llm-wasm-mcp

# Or use the command alias (if installed and in PATH)
llm-wasm-mcp

Configure Claude Desktop:

Add to your Claude Desktop configuration (see Settings โ†’ Developer โ†’ Edit Config):

{
  "mcpServers": {
    "llm-wasm-sandbox": {
      "command": "python",
      "args": ["-m", "mcp_server"]
    }
  }
}

Development Setup (from source)

Install MCP dependencies:

# Clone the repository
git clone https://github.com/ThomasRohde/llm-wasm-sandbox.git
cd llm-wasm-sandbox

# Install with uv
uv sync

# Fetch WASM binaries (required for development)
.\scripts\fetch_wlr_python.ps1
.\scripts\fetch_quickjs.ps1

Run development MCP server:

# Option 1: Using convenience script (easiest)
.\scripts\run-mcp-dev.ps1

# Option 2: Using the package directly
uv run python -m mcp_server

# Option 3: Using example scripts
uv run python examples/llm_wasm_mcp.py      # Promiscuous mode
uv run python examples/mcp_stdio_example.py  # With security filters

# Note: The installed 'llm-wasm-mcp' command is only available after 'pip install'
# In development, use one of the options above

Claude Desktop configuration for development:

{
  "mcpServers": {
    "llm-wasm-sandbox-dev": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\Users\\YourName\\Projects\\llm-wasm-sandbox",
        "run",
        "python",
        "-m",
        "mcp_server"
      ]
    }
  }
}

Alternative (using example script):

{
  "mcpServers": {
    "llm-wasm-sandbox-dev": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\Users\\YourName\\Projects\\llm-wasm-sandbox",
        "run",
        "python",
        "examples/llm_wasm_mcp.py"
      ]
    }
  }
}

Note: Use --directory instead of cwd to ensure uv resolves dependencies correctly. Replace YourName with your actual username.

MCP Tools

The MCP server exposes these tools to LLM clients:

  • execute_code: Execute Python or JavaScript code securely
  • list_runtimes: Enumerate available language runtimes
  • create_session: Create new execution sessions
    • language: "python" or "javascript"
    • auto_persist_globals (optional): Enable automatic variable persistence across executions
  • destroy_session: Clean up sessions
  • install_package: Install Python packages (Python only)
  • get_workspace_info: Inspect session state
  • reset_workspace: Clear session files

Note: WASM runtimes are bundled with the package, so no separate downloads are needed.

MCP Transports

Stdio Transport (recommended for Claude Desktop):

  • Single workspace per MCP client connection
  • Automatic session management
  • Best for local MCP clients

HTTP Transport (for web/remote clients):

  • Multi-client support with session isolation
  • CORS configuration for web applications
  • Rate limiting and authentication options

Session Management

MCP clients automatically get isolated workspaces with optional automatic variable persistence:

Automatic Global Persistence (Recommended for LLM workflows):

# Create session with auto-persist enabled
await session.call_tool("create_session", {
    "language": "python",
    "auto_persist_globals": True
})

# Variables persist automatically across executions
await session.call_tool("execute_code", {
    "code": "counter = 0; data = []",
    "language": "python"
})

await session.call_tool("execute_code", {
    "code": "counter += 1; data.append('item'); print(counter, data)",
    "language": "python"
})
# Output: 1 ['item']

Manual File Persistence:

# Without auto_persist_globals, use files for state
await session.call_tool("execute_code", {
    "code": "x = 42",
    "language": "python"
})

await session.call_tool("execute_code", {
    "code": "print(x * 2)",  # โŒ NameError: x is not defined
    "language": "python"
})

Security Features

MCP inherits all sandbox security:

  • WASM isolation with memory safety
  • Filesystem restrictions to /app directory
  • Fuel limits preventing infinite loops
  • Input validation on all MCP requests

Examples & Documentation

  • Full Documentation: docs/MCP_INTEGRATION.md - Complete reference
  • Error Guidance: docs/ERROR_GUIDANCE.md - Actionable solutions for common errors
  • Fuel Budgeting: docs/FUEL_BUDGETING.md - Planning fuel budgets for heavy packages
  • Main MCP Server: examples/llm_wasm_mcp.py - Promiscuous security (production-ready)
  • Alternative with Filters: examples/mcp_stdio_example.py - Standard security validation
  • HTTP Example: examples/mcp_http_example.py
  • Claude Desktop Config: examples/mcp_claude_desktop_config.json

๏ฟฝ Available Python Capabilities

Python Standard Library

The WASM sandbox includes CPython 3.11+ with extensive standard library support:

File & I/O Operations

  • pathlib, os.path - Path manipulation (within /app only)
  • shutil - File copying and directory operations
  • glob - Pattern matching for file search
  • tempfile - Temporary file creation (within /app)

Text & Data Processing

  • re - Regular expressions
  • json - JSON encoding/decoding
  • csv - CSV file reading/writing
  • xml.etree.ElementTree - XML parsing
  • tomllib (3.11+) or tomli - TOML parsing
  • base64, binascii - Binary encoding
  • hashlib - Cryptographic hashing (SHA, MD5, etc.)

Data Structures & Utilities

  • collections - deque, Counter, defaultdict, etc.
  • itertools - Iterator utilities
  • functools - Functional programming tools
  • typing - Type hints and annotations

Date & Time

  • datetime - Date/time manipulation
  • time - Time operations
  • calendar - Calendar utilities

Math & Statistics

  • math - Mathematical functions
  • statistics - Statistical functions
  • random - Random number generation
  • decimal, fractions - Precise numeric types

Text & Strings

  • string - String constants and utilities
  • textwrap - Text wrapping and formatting
  • difflib - Text comparison

Data Compression

  • zipfile - ZIP archive handling
  • gzip, bz2, lzma - Compression formats

Vendored Pure-Python Packages

Pre-installed packages available via sys.path.insert(0, '/app/site-packages'):

Document Processing

  • openpyxl - Read/write Excel (.xlsx) files
  • XlsxWriter - Write Excel files (lighter alternative)
  • PyPDF2 - Read/write/merge PDF files
  • odfpy - OpenDocument Format (.odf, .ods, .odp)
  • mammoth - Convert Word (.docx) to HTML/Markdown

HTTP & Encoding (Note: networking disabled in baseline WASI)

  • certifi, charset-normalizer, idna, urllib3
  • Useful for data encoding/decoding even without network access

Date/Time Extensions

  • python-dateutil - Advanced date parsing and arithmetic

Text Processing

  • tabulate - Pretty-print tables (ASCII, Markdown, HTML)
  • jinja2 + MarkupSafe - Template rendering (โš ๏ธ requires 5B fuel budget)
  • markdown - Markdown to HTML conversion

Data Modeling

  • attrs - Classes without boilerplate

Compatibility

  • six - Python 2/3 compatibility utilities
  • tomli - TOML parser (Python <3.11)

sandbox_utils Library

Shell-like utilities purpose-built for LLM code generation:

File Operations

from sandbox_utils import find, tree, walk, copy_tree, remove_tree

# Find files matching pattern
files = find("*.py", "/app", recursive=True)

# Display directory tree
print(tree("/app", max_depth=3))

# Filtered directory traversal
for path in walk("/app", filter_func=lambda p: p.suffix == ".json"):
    print(path)

Text Processing

from sandbox_utils import grep, sed, head, tail, wc, diff

# Search for pattern in files
matches = grep(r"ERROR", files, regex=True)

# Regex replacement
text = sed(r"foo(\d+)", r"bar\1", "foo123 foo456")

# Read first/last lines
content = head("/app/log.txt", lines=10)

Data Manipulation

from sandbox_utils import group_by, filter_by, sort_by, unique, chunk

# Group items by key
groups = group_by(users, lambda u: u["country"])

# Filter and sort
active = filter_by(users, lambda u: u["active"])
sorted_users = sort_by(active, lambda u: u["created_at"], reverse=True)

Format Conversions

from sandbox_utils import csv_to_json, json_to_csv, xml_to_dict

# Convert CSV to JSON
json_str = csv_to_json("/app/data.csv", output="/app/data.json")

# Parse XML to dict
data = xml_to_dict('<root><item id="1">value</item></root>')

Shell Emulation

from sandbox_utils import ls, cat, touch, mkdir, rm, cp, mv, echo

# List directory
items = ls("/app", long=True)  # Returns list of dicts with metadata

# Concatenate files
content = cat("/app/file1.txt", "/app/file2.txt")

# Create/copy/move files
touch("/app/newfile.txt")
cp("/app/source.txt", "/app/dest.txt")
mv("/app/old.txt", "/app/new.txt")

Security Note: All sandbox_utils functions enforce /app path validation and reject .. traversal attempts.

Usage Examples

Basic Import Pattern

# Vendored packages are automatically available via read-only mount at /data/site-packages
# The sandbox injects sys.path.insert(0, '/data/site-packages') automatically

# Now import vendored packages directly
import openpyxl
from tabulate import tabulate
from sandbox_utils import find, grep

Complete Workflow Example

# Vendored packages are automatically available
from sandbox_utils import find, grep, csv_to_json
from tabulate import tabulate
import json

# Find all CSV files
csv_files = find("*.csv", "/app")

# Search for errors in log files
log_files = find("*.log", "/app")
errors = grep(r"ERROR.*timeout", log_files)

# Convert CSV to JSON and process
for csv_file in csv_files:
    json_file = str(csv_file).replace('.csv', '.json')
    csv_to_json(str(csv_file), output=json_file)
    
# Load and display data
with open('/app/data.json') as f:
    data = json.load(f)
    
table = tabulate(data, headers="keys", tablefmt="markdown")
print(table)

Performance Considerations

Fuel Budget Guidelines (default: 2B instructions for library, 10B for MCP server)

Package Import Fuel Requirements

Package First Import Fuel Subsequent Imports Notes
Standard Library
json, csv, os <10M <1M โœ… Works with default budget
pathlib, re, datetime <10M <1M โœ… Works with default budget
hashlib, base64 <10M <1M โœ… Works with default budget
Lightweight Packages
tabulate ~1.4B <100M โœ… Works with default budget
markdown ~1.8B <100M โœ… Works with default budget
python-dateutil ~800M <50M โœ… Works with default budget
attrs ~500M <50M โœ… Works with default budget
Heavy Packages
jinja2 + MarkupSafe ~4-5B <100M โš ๏ธ Requires 5B+ fuel budget
openpyxl ~5-7B <100M โš ๏ธ Requires 5B+ fuel budget
PyPDF2 ~5-6B <100M โš ๏ธ Requires 5B+ fuel budget
XlsxWriter ~3-4B <100M โš ๏ธ Requires 5B+ fuel budget
mammoth ~2-3B <100M โœ… Works with default budget
odfpy ~2-3B <100M โœ… Works with default budget

sandbox_utils Operations

Operation Typical Fuel Notes
find() 100 files ~5M Linear in file count
grep() 1MB text ~20M Depends on regex complexity
csv_to_json() 10K rows ~50M Depends on row size
tree() 500 dirs ~10M Linear in directory count
ls(), cat(), cp() <5M Per operation

Fuel Budget Recommendations

  • Default (2B): Standard library + lightweight packages
  • Medium (5B): Document processing with openpyxl/PyPDF2
  • High (10B): Multiple heavy packages or complex workflows
  • MCP Server: Uses 10B default for better package compatibility

Tips for Efficient Code:

  • Cached imports: After first execution, imports in same session use cached modules
  • Use chunk() for large datasets to process in batches
  • Prefer walk() iterator over find() for very large directories
  • Set higher fuel budgets for document processing: ExecutionPolicy(fuel_budget=5_000_000_000)

๏ฟฝ๐Ÿ”’ Security Model

Multi-Layered Defense

This sandbox implements defense-in-depth with multiple security boundaries:

1. WASM Memory Safety

  • Bounds-checked linear memory (no buffer overflows)
  • Validated control flow (no arbitrary jumps)
  • Type-safe execution (strong typing enforced)

2. WASI Capability-Based I/O

  • Preopens only: File access limited to explicitly granted directories
  • No path traversal: .. and absolute paths outside capabilities are denied
  • Descriptor-based: All I/O goes through validated capability descriptors

3. Deterministic Execution Limits

  • Fuel metering: Instruction-count budgets enforce hard time limits
  • OutOfFuel trap: Exhausted budget triggers immediate termination
  • No runaway loops: Infinite loops hit fuel limit automatically

4. Resource Caps

  • Memory limit: WASM linear memory capped at configured size
  • Output limits: Stdout/stderr truncated to prevent DoS
  • No networking: Zero network capabilities (no sockets)
  • No subprocesses: Cannot spawn child processes

Security Boundaries

Boundary Mechanism Protection
Memory WASM bounds checking Prevents buffer overflows, use-after-free
Filesystem WASI preopens Restricts access to mounted directories only
CPU Fuel metering Prevents infinite loops, excessive computation
I/O Capability descriptors No ambient authority, explicit grants
Environment Variable whitelist Prevents info leaks, credentials exposure

Production Hardening

For production deployments, combine with OS-level security:

import subprocess
import signal
from pathlib import Path

def execute_with_timeout(code: str, timeout_seconds: int = 30):
    """Execute sandbox in separate process with OS timeout."""
    
    script = f"""
from sandbox import create_sandbox, RuntimeType
sandbox = create_sandbox(runtime=RuntimeType.PYTHON)
result = sandbox.execute({code!r})
print(result.stdout)
"""
    
    try:
        result = subprocess.run(
            ["python", "-c", script],
            timeout=timeout_seconds,
            capture_output=True,
            text=True
        )
        return result.stdout
    except subprocess.TimeoutExpired:
        return "Execution timeout (OS limit)"

Additional Recommendations

  • ๐Ÿณ Containers: Run sandbox in Docker/Podman for additional isolation
  • ๐Ÿ“ฆ cgroups: Use Linux cgroups for CPU/memory limits
  • ๐Ÿ“Š Monitoring: Log all executions with code hashes for audit trails
  • โฑ๏ธ OS Timeouts: Combine fuel limits with OS-level process timeouts
  • ๐Ÿ” Network Isolation: Deploy in network-restricted environments

๐Ÿ”ง Troubleshooting

Common Issues

๐Ÿšจ python.wasm not found

For PyPI installations: This should not happen - binaries are bundled automatically. Try reinstalling:

pip uninstall llm-wasm-sandbox
pip install llm-wasm-sandbox

For development/source installations: Download the WASM binary

.\scripts\fetch_wlr_python.ps1
.\scripts\fetch_quickjs.ps1

Verify: Check that bin/python.wasm exists and is ~26 MB

๐Ÿšจ ImportError: wasmtime could not be imported

Solution: Install dependencies

uv sync
# OR
pip install -r requirements.txt

Verify: python -c "import wasmtime; print(wasmtime.__version__)"

๐Ÿšจ OutOfFuel trap during execution

Cause: Code exceeded instruction budget

Solution: Increase fuel budget or simplify code

policy = ExecutionPolicy(fuel_budget=5_000_000_000)  # Increase limit
sandbox = create_sandbox(runtime=RuntimeType.PYTHON, policy=policy)
๐Ÿšจ Memory limit errors

Cause: WASM memory cap exceeded

Solution: Increase memory limit

policy = ExecutionPolicy(memory_bytes=256 * 1024 * 1024)  # 256 MB
sandbox = create_sandbox(runtime=RuntimeType.PYTHON, policy=policy)
๐Ÿšจ FileNotFoundError in guest code

Cause: Path outside preopened directories

Solution: Use /app prefix for all file operations

# โŒ Wrong
open('data.txt', 'r')

# โœ… Correct
open('/app/data.txt', 'r')
๐Ÿšจ ModuleNotFoundError for package

Cause: Package not vendored or not in sys.path

Solution 1: Use pre-vendored packages

Check if the package is already vendored (see Available Python Capabilities):

import sys
sys.path.insert(0, '/app/site-packages')
import openpyxl  # or any other vendored package

Solution 2: Vendor a new pure-Python package

# Install to vendor directory
uv run python scripts/manage_vendor.py install <package-name>

# Copy to workspace
uv run python scripts/manage_vendor.py copy

Then use in sandboxed code:

import sys
sys.path.insert(0, '/app/site-packages')
import <package-name>

Note: Only pure-Python packages work in WASM. Packages with C/Rust extensions will fail.

๐Ÿšจ ImportError from vendored package

Cause: Package has native dependencies or missing dependencies

Solution: Check if package is pure-Python

# Test package compatibility
uv run python -c "
from sandbox import create_sandbox, RuntimeType
sandbox = create_sandbox(runtime=RuntimeType.PYTHON)
result = sandbox.execute('''
import sys
sys.path.insert(0, \"/app/site-packages\")
import <package-name>
print(\"Package loaded successfully\")
''')
print(result.stdout if result.success else result.stderr)
"

Known incompatible packages:

  • jsonschema (requires rpds-py Rust extension)
  • python-docx (requires lxml C extension)
  • pdfminer.six (requires cryptography C extension)

Alternatives:

  • For JSON validation: Use manual validation or simpler libraries
  • For .docx: Use mammoth (vendored, pure-Python)
  • For PDF: Use PyPDF2 (vendored, pure-Python)
๐Ÿšจ High fuel consumption with `jinja2` or document packages

Cause: Large packages consume significant fuel on first import

Solution: Increase fuel budget for document processing

from sandbox import create_sandbox, ExecutionPolicy, RuntimeType

policy = ExecutionPolicy(
    fuel_budget=5_000_000_000  # 5B for jinja2, openpyxl, PyPDF2
)
sandbox = create_sandbox(runtime=RuntimeType.PYTHON, policy=policy)

Fuel requirements:

  • jinja2: ~4B instructions (first import)
  • openpyxl: ~3-5B instructions (first import)
  • PyPDF2: ~3B instructions (first import)
  • tabulate, markdown: <2B (works with default)

Note: Subsequent executions in the same session use cached imports (lower fuel).

๐Ÿšจ sandbox_utils path validation errors

Cause: Attempting to access files outside /app or using .. traversal

Examples of errors:

ValueError: Path must be within /app: /etc/passwd
ValueError: Path must be within /app: /app/../etc

Solution: Always use absolute paths within /app or relative paths

from sandbox_utils import find, ls

# โœ… Correct - absolute path in /app
files = find("*.txt", "/app/data")

# โœ… Correct - relative path (becomes /app/data)
files = find("*.txt", "data")

# โŒ Wrong - outside /app
files = find("*.txt", "/etc")  # Raises ValueError

# โŒ Wrong - traversal attempt
files = find("*.txt", "/app/../etc")  # Raises ValueError

Getting Help

  • ๐Ÿž Report bugs: GitHub Issues
  • ๐Ÿ“– Documentation: See inline code comments and docstrings
  • ๐Ÿ’ก Examples: Check demo*.py files and tests/ directory

๐Ÿ› ๏ธ Development

Running Tests

# Run all tests with coverage
uv run pytest tests/ -v --cov=sandbox --cov-report=html

# Run specific test file
uv run pytest tests/test_python_sandbox.py -v

# Run tests matching pattern
uv run pytest tests/ -k "session" -v

Code Quality

# Type checking
uv run mypy sandbox/

# Linting and formatting
uv run ruff check sandbox/
uv run ruff format sandbox/

Benchmarking

# Performance benchmarks
uv run python benchmark_performance.py

# Session performance
uv run python benchmark_session_performance.py

๐Ÿค Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/YOUR-USERNAME/llm-wasm-sandbox.git
cd llm-wasm-sandbox

# Install dev dependencies
uv sync

# Fetch WASM binaries
.\scripts\fetch_wlr_python.ps1
.\scripts\fetch_quickjs.ps1

# Run tests to verify setup
uv run pytest tests/ -v

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ”— References & Resources


๐Ÿš€ Roadmap

  • JavaScript runtime support (QuickJS WASM)
  • Pluggable storage adapter interface
  • Session pruning and lifecycle management
  • MCP server integration
  • Bundled WASM runtimes in PyPI package
  • Improved async execution support
  • Network sandboxing with explicit socket grants
  • Enhanced metrics and profiling
  • Web-based demo interface
  • Additional runtime support (Ruby, Lua)

Built with โค๏ธ for secure LLM code execution

Report Bug โ€ข Request Feature โ€ข Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_wasm_sandbox-0.3.4.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_wasm_sandbox-0.3.4-py3-none-any.whl (12.5 MB view details)

Uploaded Python 3

File details

Details for the file llm_wasm_sandbox-0.3.4.tar.gz.

File metadata

  • Download URL: llm_wasm_sandbox-0.3.4.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llm_wasm_sandbox-0.3.4.tar.gz
Algorithm Hash digest
SHA256 ae1efaeb0f90899ab0caec57624430cea46f5088ad4d86549ce602137b9d792c
MD5 986331c84bb5f71afeeaa823365a5f06
BLAKE2b-256 64194d6423064adbfd1374cff934c530ba430e70bc58da924c5da703f88ad11f

See more details on using hashes here.

File details

Details for the file llm_wasm_sandbox-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: llm_wasm_sandbox-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 12.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llm_wasm_sandbox-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f867a7856f5b27c6a8dea98621bfe258ca17b9e53a8f2602506c10173e4f900b
MD5 92322f6d34dbd15d656fd4f1cf98f584
BLAKE2b-256 4fc5d4a6090e56cd6b7bb7fa886fbd922d19832f5e2faca7389ec33f18608bd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page