Skip to main content

No project description provided

Project description

Synth Environments

Synthetic Environments for Long-Horizon Language Agents

Python License PyPI

A comprehensive framework for building and managing synthetic environments designed specifically for training and evaluating long-horizon language agents.

🎯 Key Features

  • 🔄 Snapshotting & Reproducibility - Full state capture and replay
  • 🏗️ Statefulness First - Built-in state management across environments
  • 🔌 Consistent APIs - Unified interface for all environment types
  • 📊 Observability - Built-in tracing and monitoring
  • 🌐 HTTP Access - RESTful API for remote training and evaluation
  • 📚 Curriculum Learning - Configurable filtering and progression
  • 🛠️ Agent Tools - Simple abstractions for agent-environment interaction

🚀 Quick Start

Installation

pip install synth-env

Basic Usage

from synth_env import Environment

# Create environment
env = Environment("sokoban")

# Run agent
state = env.reset()
while not env.done:
    action = agent.act(state)
    state = env.step(action)

Running Evaluation Scripts

The framework includes ReAct agent evaluation scripts for testing language models on various environments. These scripts provide comprehensive metrics and shaped rewards for training.

Prerequisites

  1. Start the synth service on port 8901:

    # In your service directory
    python -m uvicorn main:app --host 0.0.0.0 --port 8901
    
  2. Ensure your model is available (OpenAI, Anthropic, etc.)

TicTacToe Evaluation

cd Environments
uvpm synth_env.examples.tictactoe.agent_demos.test_tictactoe_react_agent

Features:

  • Tests strategic gameplay against random opponent
  • Provides win/loss/draw statistics
  • Validates coordinate parsing and legal moves
  • Supports multiple models (gpt-4.1-mini, o3, etc.)

NetHack Evaluation

cd Environments
uvpm synth_env.examples.nethack.agent_demos.test_nethack_react_agent

Features:

  • Comprehensive dungeon exploration evaluation
  • 26+ shaped reward signals for training
  • Balrog scoring system integration
  • Progress bars for multi-trajectory runs
  • Separates relevant vs. irrelevant metrics

Sokoban Evaluation

cd Environments  
uvpm synth_env.examples.sokoban.agent_demos.test_sokoban_react_agent

Features:

  • Classic puzzle-solving evaluation
  • Box-pushing logic validation
  • Step efficiency analysis
  • Multiple difficulty levels

Configuration

Edit the script configuration at the top of each file:

MODEL_NAME = "gpt-4.1-mini"  # or "o3", "claude-sonnet-4", etc.
NUM_INSTANCES = 5            # Number of test episodes
MAX_TURNS = 100             # Maximum steps per episode  
DIFFICULTY = "beginner"     # Environment-specific difficulty

All scripts provide detailed rubric results, progress metrics, and shaped rewards suitable for reinforcement learning applications.

Development Setup

# Clone repository
git clone https://github.com/your-org/synth-env.git
cd synth-env

# Install dependencies
uv sync

# Run tests
python dev/update_readme_metrics.py --fast

🎮 Supported Environments

Environment Status Description
Sokoban ✅ Stable Classic puzzle game for planning
Hendryks Math ✅ Stable Mathematical reasoning tasks
Crafter ✅ Stable Minecraft-like survival environment
Verilog 🔄 Beta Hardware description language tasks
Red Team 🚧 Development Security testing scenarios
SWE-Bench 🚧 Development Software engineering tasks

📖 Documentation

🔧 Development

Health Check

# Check codebase health
python scripts/check_health.py

Testing

# Fast tests (~3 seconds)
python dev/update_readme_metrics.py --fast

# Full test suite
python dev/update_readme_metrics.py

Code Quality

# Format code
ruff format .

# Check linting
ruff check .

# Type checking
uvx ty check

Release

# Increment version and publish
python scripts/release.py

# Dry run
python scripts/release.py --dry-run

Pre-Merge Checklist

Before creating a PR, see dev/pr_checklist.md for the complete checklist.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Special thanks to the research teams at DeepMind, Ragen AI, and other contributors to the environments included in this framework.


⚠️ Development Status: This project is under active development. While stable environments are production-ready, newer environments may have breaking changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synth_env-0.1.5.dev0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synth_env-0.1.5.dev0-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file synth_env-0.1.5.dev0.tar.gz.

File metadata

  • Download URL: synth_env-0.1.5.dev0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for synth_env-0.1.5.dev0.tar.gz
Algorithm Hash digest
SHA256 ed32cdfd15470618ad45cabdf5b8c4477c3962ec8d3585f8f1841511cf177863
MD5 c4a398eb2bf1661ccbfcc442e8f23fe1
BLAKE2b-256 97a9597858cfb1ec79fba3841729b94273689b2c25c34c958e5b3aa8cdf98744

See more details on using hashes here.

File details

Details for the file synth_env-0.1.5.dev0-py3-none-any.whl.

File metadata

  • Download URL: synth_env-0.1.5.dev0-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for synth_env-0.1.5.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 da42b55c7494347380895ce9417ed10fafee68f8565d05153f8826f23e69b077
MD5 73505f4e06000347e71bf6ac71f33d54
BLAKE2b-256 b430230df68156a01f13e361c7e7bbe798d15da0d1aa27ca9c4691c7856e21a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page