Skip to main content

No project description provided

Project description

Synth Environments

Synthetic Environments for Long-Horizon Language Agents

Python License PyPI

A comprehensive framework for building and managing synthetic environments designed specifically for training and evaluating long-horizon language agents.

🎯 Key Features

  • 🔄 Snapshotting & Reproducibility - Full state capture and replay
  • 🏗️ Statefulness First - Built-in state management across environments
  • 🔌 Consistent APIs - Unified interface for all environment types
  • 📊 Observability - Built-in tracing and monitoring
  • 🌐 HTTP Access - RESTful API for remote training and evaluation
  • 📚 Curriculum Learning - Configurable filtering and progression
  • 🛠️ Agent Tools - Simple abstractions for agent-environment interaction

🚀 Quick Start

Installation

pip install synth-env

Basic Usage

from synth_env import Environment

# Create environment
env = Environment("sokoban")

# Run agent
state = env.reset()
while not env.done:
    action = agent.act(state)
    state = env.step(action)

Running Evaluation Scripts

The framework includes ReAct agent evaluation scripts for testing language models on various environments. These scripts provide comprehensive metrics and shaped rewards for training.

Prerequisites

  1. Start the synth service on port 8901:

    # In your service directory
    python -m uvicorn main:app --host 0.0.0.0 --port 8901
    
  2. Ensure your model is available (OpenAI, Anthropic, etc.)

TicTacToe Evaluation

cd Environments
uvpm synth_env.examples.tictactoe.agent_demos.test_tictactoe_react_agent

Features:

  • Tests strategic gameplay against random opponent
  • Provides win/loss/draw statistics
  • Validates coordinate parsing and legal moves
  • Supports multiple models (gpt-4.1-mini, o3, etc.)

NetHack Evaluation

cd Environments
uvpm synth_env.examples.nethack.agent_demos.test_nethack_react_agent

Features:

  • Comprehensive dungeon exploration evaluation
  • 26+ shaped reward signals for training
  • Balrog scoring system integration
  • Progress bars for multi-trajectory runs
  • Separates relevant vs. irrelevant metrics

Sokoban Evaluation

cd Environments  
uvpm synth_env.examples.sokoban.agent_demos.test_sokoban_react_agent

Features:

  • Classic puzzle-solving evaluation
  • Box-pushing logic validation
  • Step efficiency analysis
  • Multiple difficulty levels

Configuration

Edit the script configuration at the top of each file:

MODEL_NAME = "gpt-4.1-mini"  # or "o3", "claude-sonnet-4", etc.
NUM_INSTANCES = 5            # Number of test episodes
MAX_TURNS = 100             # Maximum steps per episode  
DIFFICULTY = "beginner"     # Environment-specific difficulty

All scripts provide detailed rubric results, progress metrics, and shaped rewards suitable for reinforcement learning applications.

Development Setup

# Clone repository
git clone https://github.com/your-org/synth-env.git
cd synth-env

# Install dependencies
uv sync

# Run tests
python dev/update_readme_metrics.py --fast

🎮 Supported Environments

Environment Status Description
Sokoban ✅ Stable Classic puzzle game for planning
Hendryks Math ✅ Stable Mathematical reasoning tasks
Crafter ✅ Stable Minecraft-like survival environment
Verilog 🔄 Beta Hardware description language tasks
Red Team 🚧 Development Security testing scenarios
SWE-Bench 🚧 Development Software engineering tasks

📖 Documentation

🔧 Development

Health Check

# Check codebase health
python scripts/check_health.py

Testing

# Fast tests (~3 seconds)
python dev/update_readme_metrics.py --fast

# Full test suite
python dev/update_readme_metrics.py

Code Quality

# Format code
ruff format .

# Check linting
ruff check .

# Type checking
uvx ty check

Release

# Increment version and publish
python scripts/release.py

# Dry run
python scripts/release.py --dry-run

Pre-Merge Checklist

Before creating a PR, see dev/pr_checklist.md for the complete checklist.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Special thanks to the research teams at DeepMind, Ragen AI, and other contributors to the environments included in this framework.


⚠️ Development Status: This project is under active development. While stable environments are production-ready, newer environments may have breaking changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synth_env-0.1.3.dev4.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synth_env-0.1.3.dev4-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file synth_env-0.1.3.dev4.tar.gz.

File metadata

  • Download URL: synth_env-0.1.3.dev4.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for synth_env-0.1.3.dev4.tar.gz
Algorithm Hash digest
SHA256 89e38656d8d19096513f2d801bff8e89410304f759fc1c41160b35b4c552eedf
MD5 78256356d612f42ccf79f4532aae0f41
BLAKE2b-256 4f4d20ef3e007023f83658edf2f52d0eda9bb0aa3952e9df1755694cf41a4405

See more details on using hashes here.

File details

Details for the file synth_env-0.1.3.dev4-py3-none-any.whl.

File metadata

  • Download URL: synth_env-0.1.3.dev4-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for synth_env-0.1.3.dev4-py3-none-any.whl
Algorithm Hash digest
SHA256 64a3a53afca3694d32f2a0c82ed367fb61d9465129d5b99b4414f5825b1519a8
MD5 22b0a06ba6e5e71152979639b76e7745
BLAKE2b-256 af00c6e5598e8472989ae2e549003e42a00c295377c48d3e9734a39f5313bbc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page