Open-source agent simulation and benchmarking platform

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sandboxy

These details have not been verified by PyPI

Project description

Sandboxy

Open-source framework for developing, testing, and benchmarking AI agents in simulated environments.

What is Sandboxy?

Sandboxy provides a local development environment for building and testing AI agent scenarios. Define scenarios in YAML, run them against any LLM, and evaluate the results.

Use cases:

Agent Development - Build and iterate on AI agent behaviors locally
Evaluation & Testing - Run scenarios against models and score their performance
Dataset Benchmarking - Test models against datasets of cases with parallel execution
Red-teaming - Test for prompt injection, policy violations, and edge cases

Quick Start

Installation

# Using uv (recommended)
pip install uv
uv pip install sandboxy

# Or with pip
pip install sandboxy

Set up API keys

# Add your API key (OpenRouter gives access to 400+ models)
echo "OPENROUTER_API_KEY=your-key-here" >> .env

Initialize a project

mkdir my-evals && cd my-evals
sandboxy init

This creates:

my-evals/
├── scenarios/     # Your scenario YAML files
├── tools/         # Custom tool definitions
├── agents/        # Agent configurations (optional)
├── datasets/      # Test case datasets
└── runs/          # Output from runs

Run a scenario

# Run with a specific model
sandboxy run scenarios/my_scenario.yml -m openai/gpt-4o

# Compare multiple models
sandboxy run scenarios/my_scenario.yml -m openai/gpt-4o -m anthropic/claude-3.5-sonnet

# Run against a dataset
sandboxy run scenarios/my_scenario.yml --dataset datasets/cases.yml -m openai/gpt-4o

Local development UI

# Start the local dev server with UI
sandboxy open

Opens a browser with a local UI for browsing scenarios, running them, and viewing results.

Writing Scenarios

Scenarios are YAML files that define agent interactions. Sandboxy supports two modes:

Single-turn mode

Use prompt: for simple request/response scenarios without tool use:

id: simple-qa
name: "Simple Q&A"

system_prompt: |
  You are a helpful assistant.

prompt: |
  What is the capital of France?

evaluation:
  max_score: 100
  goals:
    - id: correct_answer
      name: "Correct Answer"
      points: 100
      detection:
        type: agent_contains
        patterns:
          - "Paris"

Agentic mode

Use steps: for multi-turn scenarios with tool support:

id: customer-support
name: "Customer Support Test"
description: "Test how an agent handles a refund request"

system_prompt: |
  You are a customer support agent for TechCo.
  Be helpful but follow company policy.

steps:
  - id: user_request
    action: inject_user
    params:
      content: "I want a refund for my purchase. Order #12345."
  - id: agent_response
    action: await_agent

# Tools are only available in agentic mode (with steps)
tools:
  lookup_order:
    description: "Look up order details"
    actions:
      call:
        params:
          order_id:
            type: string
            required: true
        returns: "Order details for {{order_id}}"

evaluation:
  max_score: 100
  goals:
    - id: acknowledged_request
      name: "Acknowledged Request"
      description: "Agent acknowledged the refund request"
      points: 50
      detection:
        type: agent_contains
        patterns:
          - "refund"

    - id: looked_up_order
      name: "Looked Up Order"
      description: "Agent used the lookup tool"
      points: 50
      detection:
        type: tool_called
        tool: lookup_order

CLI Reference

# Run scenarios
sandboxy run <file.yml> -m <model>           # Run a scenario
sandboxy run <file.yml> -m <model> --runs 5  # Multiple runs
sandboxy run <file.yml> --dataset <data.yml> # Run against dataset

# Development
sandboxy open                    # Start local UI
sandboxy serve                   # API server only (no browser)
sandboxy init                    # Initialize project structure

# Scaffolding
sandboxy new scenario <name>     # Create scenario template
sandboxy new tool <name>         # Create tool library template

# Information
sandboxy list-models             # List available models
sandboxy list-tools              # List available tool libraries
sandboxy info <file.yml>         # Show scenario details

# MCP Integration
sandboxy mcp inspect <command>   # Inspect MCP server tools
sandboxy mcp list                # List known MCP servers

Models

Sandboxy supports 400+ models via OpenRouter, plus direct provider access:

# OpenRouter models (recommended)
sandboxy run scenario.yml -m openai/gpt-4o
sandboxy run scenario.yml -m anthropic/claude-3.5-sonnet
sandboxy run scenario.yml -m google/gemini-pro
sandboxy run scenario.yml -m meta-llama/llama-3-70b

# List available models
sandboxy list-models
sandboxy list-models --search claude
sandboxy list-models --free

MLflow Integration

Export scenario run results to MLflow for experiment tracking and model comparison.

# Install with MLflow support
pip install sandboxy[mlflow]

# Export run to MLflow
sandboxy scenario scenarios/test.yml -m openai/gpt-4o --mlflow-export

# Custom experiment name
sandboxy scenario scenarios/test.yml -m gpt-4o --mlflow-export --mlflow-experiment "my-evals"

Or enable in scenario YAML:

id: my-scenario
name: "My Test"

mlflow:
  enabled: true
  experiment: "agent-evals"
  tags:
    team: "support"

system_prompt: |
  ...

See MLFLOW_TRACKING_URI env variable to configure the MLflow server.

Configuration

Environment variables (in ~/.sandboxy/.env or project .env):

Variable	Description
`OPENROUTER_API_KEY`	OpenRouter API key (400+ models)
`OPENAI_API_KEY`	Direct OpenAI access
`ANTHROPIC_API_KEY`	Direct Anthropic access
`MLFLOW_TRACKING_URI`	MLflow tracking server URI

Project Structure

sandboxy/
├── sandboxy/           # Python package
│   ├── core/           # Runner, state management
│   ├── scenarios/      # Unified scenario runner
│   ├── datasets/       # Dataset benchmarking
│   ├── agents/         # Agent loading and execution
│   ├── tools/          # Tool loading (YAML tools)
│   ├── providers/      # LLM provider integrations
│   ├── api/            # Local dev API server
│   ├── cli/            # Command-line interface
│   ├── local/          # Local project context
│   └── mcp/            # MCP client integration
└── local-ui/           # Local development UI (React)

Contributing

Contributions welcome! See CONTRIBUTING.md.

License

Apache 2.0 - see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sandboxy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.8

Feb 8, 2026

This version

0.0.7

Feb 5, 2026

0.0.6

Feb 5, 2026

0.0.5

Feb 5, 2026

0.0.4

Feb 4, 2026

0.0.3

Jan 16, 2026

0.0.2

Jan 16, 2026

0.0.1

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandboxy-0.0.7.tar.gz (446.7 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sandboxy-0.0.7-py3-none-any.whl (275.9 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file sandboxy-0.0.7.tar.gz.

File metadata

Download URL: sandboxy-0.0.7.tar.gz
Upload date: Feb 5, 2026
Size: 446.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sandboxy-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`d7aff714d6913a22d69eef4fdf3e9ac70ab145f5ba4371ee25f824f310593a39`
MD5	`ca6426b1ad9084af745f0965bba406dc`
BLAKE2b-256	`30d560ca4155e5195507037f09be06b8b1ae5faf46271a74f7997483e292fe2f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sandboxy-0.0.7.tar.gz:

Publisher: publish.yml on sandboxy-ai/sandboxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sandboxy-0.0.7.tar.gz
- Subject digest: d7aff714d6913a22d69eef4fdf3e9ac70ab145f5ba4371ee25f824f310593a39
- Sigstore transparency entry: 920018288
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: sandboxy-ai/sandboxy@9a86dee22431455ecc44a6a013ad20bfffd3a21d
- Branch / Tag: refs/tags/v0.0.7
- Owner: https://github.com/sandboxy-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9a86dee22431455ecc44a6a013ad20bfffd3a21d
- Trigger Event: push

File details

Details for the file sandboxy-0.0.7-py3-none-any.whl.

File metadata

Download URL: sandboxy-0.0.7-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 275.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sandboxy-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc1f5ca5bfaf3434c0fb1036e958bd7d553324e38353f8c4b2f9f48bd83af54d`
MD5	`58a9ee0351810970bf1693b37dcbc6b6`
BLAKE2b-256	`37099d9c0061139f897d4f6c04ff1357d6f3a066f3a4ce12094f7391f7ef6b65`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sandboxy-0.0.7-py3-none-any.whl:

Publisher: publish.yml on sandboxy-ai/sandboxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sandboxy-0.0.7-py3-none-any.whl
- Subject digest: fc1f5ca5bfaf3434c0fb1036e958bd7d553324e38353f8c4b2f9f48bd83af54d
- Sigstore transparency entry: 920018293
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: sandboxy-ai/sandboxy@9a86dee22431455ecc44a6a013ad20bfffd3a21d
- Branch / Tag: refs/tags/v0.0.7
- Owner: https://github.com/sandboxy-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9a86dee22431455ecc44a6a013ad20bfffd3a21d
- Trigger Event: push

sandboxy 0.0.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sandboxy

What is Sandboxy?

Quick Start

Installation

Set up API keys

Initialize a project

Run a scenario

Local development UI

Writing Scenarios

Single-turn mode

Agentic mode

CLI Reference

Models

MLflow Integration

Configuration

Project Structure

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance