Skip to main content

Horizons AI - Advanced Reinforcement Learning Environments

Project description

Horizons AI

Advanced Reinforcement Learning Environments

Python License PyPI

A comprehensive framework for building and managing synthetic environments designed specifically for training and evaluating long-horizon language agents.

🎯 Key Features

  • 🔄 Snapshotting & Reproducibility - Full state capture and replay
  • 🏗️ Statefulness First - Built-in state management across environments
  • 🔌 Consistent APIs - Unified interface for all environment types
  • 📊 Observability - Built-in tracing and monitoring
  • 🌐 HTTP Access - RESTful API for remote training and evaluation
  • 📚 Curriculum Learning - Configurable filtering and progression
  • 🛠️ Agent Tools - Simple abstractions for agent-environment interaction

🚀 Quick Start

Installation

pip install horizons-ai

Basic Usage

import horizons

# Create environment
env = horizons.Environment("sokoban")

# Run agent
state = env.reset()
while not env.done:
    action = agent.act(state)
    state = env.step(action)

Running Evaluation Scripts

The framework includes ReAct agent evaluation scripts for testing language models on various environments. These scripts provide comprehensive metrics and shaped rewards for training.

Prerequisites

  1. Start the synth service on port 8901:

    # In your service directory
    python -m uvicorn main:app --host 0.0.0.0 --port 8901
    
  2. Ensure your model is available (OpenAI, Anthropic, etc.)

TicTacToe Evaluation

cd Environments
uvpm synth_env.examples.tictactoe.agent_demos.test_tictactoe_react_agent

Features:

  • Tests strategic gameplay against random opponent
  • Provides win/loss/draw statistics
  • Validates coordinate parsing and legal moves
  • Supports multiple models (gpt-4.1-mini, o3, etc.)

NetHack Evaluation

cd Environments
uvpm synth_env.examples.nethack.agent_demos.test_nethack_react_agent

Features:

  • Comprehensive dungeon exploration evaluation
  • 26+ shaped reward signals for training
  • Balrog scoring system integration
  • Progress bars for multi-trajectory runs
  • Separates relevant vs. irrelevant metrics

Sokoban Evaluation

cd Environments  
uvpm synth_env.examples.sokoban.agent_demos.test_sokoban_react_agent

Features:

  • Classic puzzle-solving evaluation
  • Box-pushing logic validation
  • Step efficiency analysis
  • Multiple difficulty levels

Configuration

Edit the script configuration at the top of each file:

MODEL_NAME = "gpt-4.1-mini"  # or "o3", "claude-sonnet-4", etc.
NUM_INSTANCES = 5            # Number of test episodes
MAX_TURNS = 100             # Maximum steps per episode  
DIFFICULTY = "beginner"     # Environment-specific difficulty

All scripts provide detailed rubric results, progress metrics, and shaped rewards suitable for reinforcement learning applications.

Development Setup

# Clone repository
git clone https://github.com/your-org/synth-env.git
cd synth-env

# Install dependencies
uv sync

# Run tests
python dev/update_readme_metrics.py --fast

🎮 Supported Environments

Environment Status Description
Sokoban ✅ Stable Classic puzzle game for planning
Hendryks Math ✅ Stable Mathematical reasoning tasks
Crafter ✅ Stable Minecraft-like survival environment
Verilog 🔄 Beta Hardware description language tasks
Red Team 🚧 Development Security testing scenarios
SWE-Bench 🚧 Development Software engineering tasks

📖 Documentation

🔧 Development

Health Check

# Check codebase health
python scripts/check_health.py

Testing

# Fast tests (~3 seconds)
python dev/update_readme_metrics.py --fast

# Full test suite
python dev/update_readme_metrics.py

Code Quality

# Format code
ruff format .

# Check linting
ruff check .

# Type checking
uvx ty check

Release

# Increment version and publish
python scripts/release.py

# Dry run
python scripts/release.py --dry-run

Pre-Merge Checklist

Before creating a PR, see dev/pr_checklist.md for the complete checklist.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Special thanks to the research teams at DeepMind, Ragen AI, and other contributors to the environments included in this framework.


⚠️ Development Status: This project is under active development. While stable environments are production-ready, newer environments may have breaking changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

horizons_ai-0.1.0.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

horizons_ai-0.1.0-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file horizons_ai-0.1.0.tar.gz.

File metadata

  • Download URL: horizons_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for horizons_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3c77a18d67b08bb6ae8b41a42726c5dd046b1f5af32b0fc37ac56a594a2e3860
MD5 3fb631d550efa8764a30d2f75f7587af
BLAKE2b-256 f9befbd7b0eb45bdcf182aa9f53dcef7641d28b51b573e31f0ab6b06c7542b75

See more details on using hashes here.

File details

Details for the file horizons_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: horizons_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for horizons_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8beaaef5fd32af92a873a9d1fbfdd3c6c5d6be7c18a822d67948dcd930fff05a
MD5 daa662c497a4a92dfea25a6a09c01c4f
BLAKE2b-256 a211915f73d82bd16de97f99900c3f60d1a913b1536edba97a4ec3d26f496813

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page