No project description provided
Project description
Synth Environments
Synthetic Environments for Long-Horizon Language Agents
A comprehensive framework for building and managing synthetic environments designed specifically for training and evaluating long-horizon language agents.
🎯 Key Features
- 🔄 Snapshotting & Reproducibility - Full state capture and replay
- 🏗️ Statefulness First - Built-in state management across environments
- 🔌 Consistent APIs - Unified interface for all environment types
- 📊 Observability - Built-in tracing and monitoring
- 🌐 HTTP Access - RESTful API for remote training and evaluation
- 📚 Curriculum Learning - Configurable filtering and progression
- 🛠️ Agent Tools - Simple abstractions for agent-environment interaction
🚀 Quick Start
Installation
pip install synth-env
Basic Usage
from synth_env import Environment
# Create environment
env = Environment("sokoban")
# Run agent
state = env.reset()
while not env.done:
action = agent.act(state)
state = env.step(action)
Running Evaluation Scripts
The framework includes ReAct agent evaluation scripts for testing language models on various environments. These scripts provide comprehensive metrics and shaped rewards for training.
Prerequisites
-
Start the synth service on port 8901:
# In your service directory python -m uvicorn main:app --host 0.0.0.0 --port 8901
-
Ensure your model is available (OpenAI, Anthropic, etc.)
TicTacToe Evaluation
cd Environments
uvpm synth_env.examples.tictactoe.agent_demos.test_tictactoe_react_agent
Features:
- Tests strategic gameplay against random opponent
- Provides win/loss/draw statistics
- Validates coordinate parsing and legal moves
- Supports multiple models (gpt-4.1-mini, o3, etc.)
NetHack Evaluation
cd Environments
uvpm synth_env.examples.nethack.agent_demos.test_nethack_react_agent
Features:
- Comprehensive dungeon exploration evaluation
- 26+ shaped reward signals for training
- Balrog scoring system integration
- Progress bars for multi-trajectory runs
- Separates relevant vs. irrelevant metrics
Sokoban Evaluation
cd Environments
uvpm synth_env.examples.sokoban.agent_demos.test_sokoban_react_agent
Features:
- Classic puzzle-solving evaluation
- Box-pushing logic validation
- Step efficiency analysis
- Multiple difficulty levels
Configuration
Edit the script configuration at the top of each file:
MODEL_NAME = "gpt-4.1-mini" # or "o3", "claude-sonnet-4", etc.
NUM_INSTANCES = 5 # Number of test episodes
MAX_TURNS = 100 # Maximum steps per episode
DIFFICULTY = "beginner" # Environment-specific difficulty
All scripts provide detailed rubric results, progress metrics, and shaped rewards suitable for reinforcement learning applications.
Development Setup
# Clone repository
git clone https://github.com/your-org/synth-env.git
cd synth-env
# Install dependencies
uv sync
# Run tests
python dev/update_readme_metrics.py --fast
🎮 Supported Environments
| Environment | Status | Description |
|---|---|---|
| Sokoban | ✅ Stable | Classic puzzle game for planning |
| Hendryks Math | ✅ Stable | Mathematical reasoning tasks |
| Crafter | ✅ Stable | Minecraft-like survival environment |
| Verilog | 🔄 Beta | Hardware description language tasks |
| Red Team | 🚧 Development | Security testing scenarios |
| SWE-Bench | 🚧 Development | Software engineering tasks |
📖 Documentation
- API Reference - Complete API documentation
- Environment Guide - Detailed environment descriptions
- Contributing - Development setup and guidelines
🔧 Development
Health Check
# Check codebase health
python scripts/check_health.py
Testing
# Fast tests (~3 seconds)
python dev/update_readme_metrics.py --fast
# Full test suite
python dev/update_readme_metrics.py
Code Quality
# Format code
ruff format .
# Check linting
ruff check .
# Type checking
uvx ty check
Release
# Increment version and publish
python scripts/release.py
# Dry run
python scripts/release.py --dry-run
Pre-Merge Checklist
Before creating a PR, see dev/pr_checklist.md for the complete checklist.
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for:
- Development setup
- Code style guidelines
- Testing requirements
- Pull request process
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
Special thanks to the research teams at DeepMind, Ragen AI, and other contributors to the environments included in this framework.
⚠️ Development Status: This project is under active development. While stable environments are production-ready, newer environments may have breaking changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synth_env-0.1.3.dev4.tar.gz.
File metadata
- Download URL: synth_env-0.1.3.dev4.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89e38656d8d19096513f2d801bff8e89410304f759fc1c41160b35b4c552eedf
|
|
| MD5 |
78256356d612f42ccf79f4532aae0f41
|
|
| BLAKE2b-256 |
4f4d20ef3e007023f83658edf2f52d0eda9bb0aa3952e9df1755694cf41a4405
|
File details
Details for the file synth_env-0.1.3.dev4-py3-none-any.whl.
File metadata
- Download URL: synth_env-0.1.3.dev4-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64a3a53afca3694d32f2a0c82ed367fb61d9465129d5b99b4414f5825b1519a8
|
|
| MD5 |
22b0a06ba6e5e71152979639b76e7745
|
|
| BLAKE2b-256 |
af00c6e5598e8472989ae2e549003e42a00c295377c48d3e9734a39f5313bbc0
|