Skip to main content

Multi-game puzzle server for telnet - LLMs with MCP solvers welcome!

Project description

Puzzle Arcade Server

Test Coverage Python 3.11+ Code style: ruff Pydantic v2 Type Checked

A multi-game puzzle server and LLM reasoning benchmark arcade hosting 24 different logic puzzle types, built using the chuk-protocol-server framework.

Perfect for:

  • ๐Ÿค– LLM Agent Testing - Benchmark reasoning capabilities across constraint types
  • ๐ŸŽฏ CP-SAT Education - Learn constraint programming through progressive puzzles
  • ๐Ÿ’ผ Business Demos - Map puzzle patterns to real scheduling, optimization, and allocation problems
  • ๐Ÿ”ง MCP Tool Integration - Showcase CHUK + constraint solver workflows

Each puzzle demonstrates specific constraint patterns (AllDifferent, Optimization, Connectivity, Boolean SAT, etc.) and maps to business use cases (scheduling, resource allocation, routing, etc.).

Try It Now

A live demo server is running on Fly.io. Try it instantly:

# Connect via Telnet (IPv6)
telnet 2a09:8280:1::b8:79f4:0 8023

# WebSocket connections
ws://puzzle-arcade-server.fly.dev:8025/ws

Once connected, type help to see available games, or sudoku easy to start playing!

Features

  • 24 Puzzle Games with three difficulty levels each (easy, medium, hard)
    • 7 Classic Logic Puzzles - Sudoku, KenKen, Kakuro, Binary, Futoshiki, Nonogram, Logic Grid
    • 7 Advanced CP-SAT Puzzles - Killer Sudoku, Lights Out, Mastermind, Slitherlink, Bridges, Hitori, Shikaku
    • 5 Specialized Constraint Puzzles - Hidato, Tents and Trees, Fillomino, Star Battle, Sokoban
    • 2 Optimization Challenges - Knapsack, Task Scheduler
    • 3 Advanced Reasoning Puzzles - Nurikabe, Einstein's Puzzle, Minesweeper
  • Agent-Friendly Mode - Structured output with clear markers for AI agents and tools
    • Enable with mode agent command
    • Machine-parseable grid format with clear start/end markers
    • Compact output optimized for LLM tool integration
  • Evaluation Harness (puzzle-arcade-eval) - Built-in benchmarking CLI
    • Batch evaluation with configurable episodes
    • Multiple output formats (JSON, CSV, Markdown)
    • Metrics: moves, invalid moves, hints, solve time
    • Reproducible with deterministic seeds
  • Multiple transport protocols:
    • Telnet (port 8023) - Classic telnet protocol
    • TCP (port 8024) - Raw TCP connections
    • WebSocket (port 8025) - Modern WebSocket protocol
    • WebSocket-Telnet (port 8026) - WebSocket with telnet negotiation
  • Interactive menu-driven interface with game selection
  • Hint system for when you're stuck
  • Solution checker and auto-solver for all games
  • Clean ASCII art grids - perfectly aligned for easy parsing
  • Deterministic seeding - Replay any puzzle with the same seed
  • Gymnasium-compatible RL Environment (PuzzleEnv) for training agents
  • Comprehensive test suite (1067 tests, 94% coverage)
  • Modern Python best practices:
    • Pydantic v2 native - All models use ConfigDict for type safety
    • Async native - Full async/await support throughout
    • Type-safe - No dict["key"] patterns, only typed models
    • Enum-based - No magic strings, proper enum constants
  • Modern Python packaging with pyproject.toml
  • Docker and Fly.io deployment ready

Available Games

Classic Logic Puzzles

Game Grid Size Constraint Types Status
Sudoku 9ร—9 AllDifferent (rows, cols, boxes) โœ… Complete
KenKen 4ร—4 to 6ร—6 Arithmetic cages + AllDifferent โœ… Complete
Kakuro 5ร—5 to 8ร—8 Sum constraints + AllDifferent โœ… Complete
Binary Puzzle 6ร—6 to 10ร—10 Adjacency limits + Equal counts โœ… Complete
Futoshiki 4ร—4 to 6ร—6 Inequalities + AllDifferent โœ… Complete
Nonogram 5ร—5 to 10ร—10 Line sum constraints + Blocks โœ… Complete
Logic Grid Variable Category associations + Logic โœ… Complete

Advanced CP-SAT Puzzles

Game Grid Size Constraint Types Status
Killer Sudoku 9ร—9 Linear constraints + AllDifferent + Cages โœ… Complete
Lights Out 5ร—5 to 7ร—7 Boolean XOR constraints (SAT) โœ… Complete
Mastermind 4-6 pegs Deduction + Feedback constraints โœ… Complete
Slitherlink 5ร—5 to 10ร—10 Global loop + Edge constraints โœ… Complete
Bridges 7ร—7 to 11ร—11 Connectivity + Degree constraints โœ… Complete
Hitori 5ร—5 to 9ร—9 AllDifferent + Adjacency + Connectivity โœ… Complete
Shikaku 6ร—6 to 10ร—10 Area partitioning + Rectangle covering โœ… Complete

Specialized Constraint Puzzles

Game Grid Size Constraint Types Status
Hidato 5ร—5 to 9ร—9 Sequential adjacency + Hamiltonian path โœ… Complete
Tents and Trees 6ร—6 to 10ร—10 Bipartite matching + Adjacency avoidance โœ… Complete
Fillomino 6ร—6 to 10ร—10 Region growth + Self-referential constraints โœ… Complete
Star Battle 6ร—6 to 10ร—10 Multi-region placement + Adjacency avoidance โœ… Complete
Sokoban 6ร—6 to 10ร—10 Spatial planning + Irreversible actions (optimization) โœ… Complete

Optimization Challenges

Game Problem Size Constraint Types Status
Knapsack 5-12 items Value maximization + Capacity constraint โœ… Complete
Task Scheduler 4-8 tasks Makespan minimization + Dependencies + Resources โœ… Complete

Advanced Reasoning Puzzles

Game Grid Size Constraint Types Status
Nurikabe 6ร—6 to 10ร—10 Connectivity + Island sizes + No 2ร—2 blocks โœ… Complete
Einstein's Puzzle 5 houses ร— 5 attributes Multi-attribute deduction + Logic chains โœ… Complete
Minesweeper 6ร—6 to 10ร—10 Probabilistic reasoning + Safe deduction โœ… Complete

Solver Profiles & Business Mapping

Each game includes metadata for constraint types, business analogies, and complexity profiles, making it easy to:

  • Select puzzles by constraint pattern - Need to demonstrate Boolean SAT? โ†’ Lights Out
  • Map to business use cases - Task Scheduler โ†’ Sprint Planning, Knapsack โ†’ Portfolio Selection
  • Benchmark LLM reasoning - Compare model performance across different constraint densities

Example: Query Games by Profile

from puzzle_arcade_server.games import AVAILABLE_GAMES

# Find all optimization problems
optimization_games = [
    name for name, game_class in AVAILABLE_GAMES.items()
    if "optimization" in game_class().constraint_types
]
# โ†’ ['knapsack', 'scheduler']

# Find games that model resource allocation
resource_games = [
    name for name, game_class in AVAILABLE_GAMES.items()
    if "resource_allocation" in game_class().business_analogies
]
# โ†’ ['scheduler', 'knapsack']

Quick Reference: Constraint Types to Business Problems

Constraint Pattern Puzzle Examples Business Use Cases
Optimization Knapsack, Scheduler Portfolio selection, Sprint planning, Budget allocation
Precedence Scheduler Project dependencies, Workflow sequencing
Sequential Adjacency Hidato Path planning, Route sequencing, Tour optimization
Hamiltonian Path Hidato Traveling salesman, Circuit design
Bipartite Matching Tents and Trees Job assignment, Resource pairing
Region Growth Fillomino Territory expansion, Cluster formation
Spatial Planning Sokoban Warehouse logistics, Movement planning
Connectivity Nurikabe, Slitherlink Network design, Routing, Zone planning
Global Loop Slitherlink Circuit design, Path finding
Boolean SAT Lights Out Feature dependencies, Toggle systems
Cage Sums Killer Sudoku, Kakuro Team budgets, Grouped constraints
AllDifferent Sudoku, KenKen Resource uniqueness, Assignment problems

Quick Start

Prerequisites

  • Python 3.11 or higher
  • UV (recommended) or pip

Installation

From Source (Development)

Using UV (Recommended)
# Clone the repository
git clone https://github.com/chrishayuk/puzzle-arcade-server.git
cd puzzle-arcade-server

# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install development dependencies
make dev-install

# Run the server
make run
Using pip
# Clone the repository
git clone https://github.com/chrishayuk/puzzle-arcade-server.git
cd puzzle-arcade-server

# Install in development mode with dev dependencies
pip install -e ".[dev]"

# Run the server
PYTHONPATH=. uv run --with chuk-protocol-server chuk-protocol-server server-launcher -c config.yaml

Using Make (All Commands)

# See all available commands
make help

# Development workflow
make dev-install      # Install dev dependencies
make run              # Run the server
make test             # Run tests
make test-cov         # Run tests with coverage report
make check            # Run linting and type checking
make format           # Format code with ruff
make security         # Run security checks

# Docker workflow
make docker-build     # Build Docker image
make docker-run       # Run in Docker container

# Examples
make example-telnet              # Browse games via telnet
make example-telnet-sudoku       # Sudoku demo
make example-telnet-kenken       # KenKen demo
make example-ws                  # WebSocket tour
make example-ws-interactive      # Interactive WebSocket mode

# Deployment
make fly-deploy       # Deploy to Fly.io
make fly-logs         # View Fly.io logs

Docker Setup

Build and run with Docker:

# Using Make
make docker-run

# Or manually
docker build -t puzzle-arcade-server .
docker run -p 8023:8023 -p 8024:8024 -p 8025:8025 -p 8026:8026 puzzle-arcade-server

Connecting to the Server

Local Development

Via Telnet:

telnet localhost 8023

Via Netcat (TCP):

nc localhost 8024

Via WebSocket:

ws://localhost:8025/ws
ws://localhost:8026/ws

Game Menu

When you connect, you'll see the main menu:

==================================================
       WELCOME TO THE PUZZLE ARCADE!
==================================================

CLASSIC LOGIC PUZZLES:
  1) Sudoku          - Classic logic puzzle - fill 9x9 grid with digits 1-9
  2) KenKen          - Arithmetic cage puzzle - combine math and logic
  3) Kakuro          - Crossword math puzzle - fill runs with unique digits that sum to clues
  4) Binary Puzzle   - Fill grid with 0s and 1s - no three in a row, equal counts
  5) Futoshiki       - Inequality number puzzle - fill grid with constraints
  6) Nonogram        - Picture logic puzzle - reveal image from number clues
  7) Logic Grid      - Deductive reasoning puzzle - match attributes using logic

ADVANCED CP-SAT PUZZLES:
  8) Killer Sudoku   - Sudoku + Kakuro - regions must sum to targets
  9) Lights Out      - Toggle lights to turn all off - XOR constraint puzzle
 10) Mastermind      - Code-breaking with logical deduction and feedback
 11) Slitherlink     - Draw a single loop - numbers show edge counts
 12) Bridges         - Connect islands with bridges - satisfy all numbers
 13) Hitori          - Shade cells to eliminate duplicates - no adjacent shading
 14) Shikaku         - Divide grid into rectangles matching areas

SPECIALIZED CONSTRAINT PUZZLES:
 15) Hidato          - Sequential path puzzle - connect numbers adjacently
 16) Tents           - Place tents next to trees - bipartite matching puzzle
 17) Fillomino       - Fill regions with numbers matching region size
 18) Star Battle     - Place stars avoiding adjacency - multi-region placement
 19) Sokoban         - Push boxes to targets - spatial planning puzzle

OPTIMIZATION CHALLENGES:
 20) Knapsack        - Maximize value within capacity constraints
 21) Task Scheduler  - Minimize makespan with dependencies and resources

ADVANCED REASONING PUZZLES:
 22) Nurikabe        - Island and sea puzzle - connectivity constraints
 23) Einstein's Puzzle - Who owns the fish? Multi-attribute deduction
 24) Minesweeper     - Find all mines using logical deduction

Commands:
  <number>  - Select game by number
  <name>    - Select game by name (e.g., 'sudoku')
  help      - Show this menu again
  quit      - Exit the server
==================================================

Agent-Friendly Mode

The server includes a special agent mode designed for AI tools and LLM integration:

Enabling Agent Mode

> mode agent
Output mode set to: agent

Agent Mode Features

Structured Output - Grid data is wrapped with clear start/end markers:

---GAME-START---
GAME: Sudoku
DIFFICULTY: medium
MOVES: 3
---GRID-START---
  | 1 2 3 | 4 5 6 | 7 8 9 |
  -------------------------
1 | . . 3 | . 2 . | 6 . . |
...
---GRID-END---
---GAME-END---

Benefits for AI Agents:

  • Easy parsing with regex: ---GRID-START---(.*?)---GRID-END---
  • Consistent metadata format (GAME, DIFFICULTY, MOVES)
  • No decorative text or banners to filter out
  • Minimal token usage compared to normal mode

Switching Modes:

  • mode normal - Human-friendly output (default)
  • mode agent - Machine-parseable structured output
  • mode compact - Reserved for future use

Gymnasium-Compatible RL Environment

The project includes a Gymnasium-compatible environment for training reinforcement learning agents:

Quick Start

from puzzle_arcade_server.gym_env import PuzzleEnv

# Create environment for any of the 24 games
env = PuzzleEnv("sudoku", difficulty="easy", seed=42)

# Reset to start a new episode
obs, info = await env.reset()

# Take actions (text commands or tuples)
obs, reward, terminated, truncated, info = await env.step("place 1 1 5")

# Or use tuple format
obs, reward, terminated, truncated, info = await env.step(("place", 1, 1, 5))

# Get available games
games = PuzzleEnv.available_games()
# โ†’ ['sudoku', 'kenken', 'minesweeper', ...]

Features

  • All 24 games accessible through unified API
  • Configurable rewards for correct moves, invalid attempts, completion bonuses
  • Hint system with optional budget limits
  • Solver-free mode for pure reasoning benchmarks
  • Efficiency scoring based on optimal step counts
  • Deterministic seeding for reproducible experiments

Observation Space

obs = {
    "game": "sudoku",
    "difficulty": "easy",
    "seed": 42,
    "moves": 5,
    "invalid_moves": 1,
    "hints_used": 2,
    "is_complete": False,
    "grid": [[4, 0, 8, ...], ...]  # Game-specific state
}

Reward Configuration

env = PuzzleEnv("kenken", reward_config={
    "correct_placement": 1.0,      # Reward for valid moves
    "invalid_attempt": -0.5,       # Penalty for invalid moves
    "completion_bonus": 10.0,      # Bonus for solving
    "hint_penalty": -0.1,          # Penalty for using hints
    "efficiency_multiplier": 2.0,  # Scales completion bonus by efficiency
})

Solver Configuration

from puzzle_arcade_server.models import SolverConfig

# Solver-free mode (no hints allowed)
config = SolverConfig.solver_free()
env = PuzzleEnv("sudoku", solver_config=config)

# Limited hints
config = SolverConfig(hint_budget=5, hint_penalty=0.1)
env = PuzzleEnv("sudoku", solver_config=config)

Evaluation Harness

The project includes a built-in evaluation harness for benchmarking puzzle-solving agents:

Quick Start

# List all available games
puzzle-arcade-eval --list-games

# Evaluate a specific game (10 episodes, medium difficulty)
puzzle-arcade-eval sudoku -d medium -n 10 -v

# Evaluate all games (5 episodes each)
puzzle-arcade-eval --all -d easy -n 5

# Output as JSON for analysis
puzzle-arcade-eval sudoku -n 20 -o json > results.json

Using Make Targets

make eval           # Quick evaluation (3 episodes per game)
make eval-sudoku    # Evaluate Sudoku (10 episodes)
make eval-all       # Evaluate all games (10 episodes each)
make eval-json      # Output as JSON
make list-games     # List available games

Sample Output

Sudoku Medium Evaluation (10 episodes)
==================================================
Solved:     10/10 (100.0%)
Avg Moves:  45.3
Avg Invalid: 0.0
Avg Time:   12ms

Output Formats

  • text (default) - Human-readable summary
  • json - Structured JSON for programmatic analysis
  • csv - Spreadsheet-compatible format
  • markdown - Documentation-ready tables

Metrics Collected

Metric Description
solved Whether the puzzle was solved
moves_made Number of valid moves
invalid_moves Number of rejected moves
hints_used Number of hints requested
wall_time_ms Time to solve in milliseconds
seed Puzzle seed for reproducibility

Universal Game Commands

All games support these commands:

Starting and Managing Games

  • <number> [difficulty] - Select game by number (e.g., 1 medium)
  • <name> [difficulty] - Select game by name (e.g., sudoku hard)
  • show - Display the current grid
  • mode <normal|agent|compact> - Set output mode
  • help - Show game-specific commands and rules
  • menu - Return to main menu
  • quit - Exit the server

Playing Games

  • place <row> <col> <value> - Place a number/value on the grid
    • Example: place 1 5 7 (places 7 at row 1, column 5)
  • clear <row> <col> - Clear a cell you've filled
  • hint - Get a hint for the next move
  • check - Check your progress
  • solve - Show the solution (ends current game)

Special Commands (Game-Specific)

  • Logic Grid: connect and exclude commands for associations
  • See in-game help for game-specific commands

Example Gameplay Sessions

Sudoku

> sudoku medium

==================================================
SUDOKU - MEDIUM MODE
==================================================
Fill the grid so that every row, column, and 3x3 box
contains the digits 1-9 without repetition.

Type 'help' for commands or 'hint' for a clue.
==================================================

  | 1 2 3 | 4 5 6 | 7 8 9 |
  -------------------------
1 | . . 3 | . 2 . | 6 . . |
2 | 9 . . | 3 . 5 | . . 1 |
3 | . . 1 | 8 . 6 | 4 . . |
  -------------------------
4 | . . 8 | 1 . 2 | 9 . . |
5 | 7 . . | . . . | . . 8 |
6 | . . 6 | 7 . 8 | 2 . . |
  -------------------------
7 | . . 2 | 6 . 9 | 5 . . |
8 | 8 . . | 2 . 3 | . . 9 |
9 | . . 5 | . 1 . | 3 . . |
  -------------------------
Moves made: 0
==================================================

> hint
Hint: Try placing 4 at row 1, column 1

> place 1 1 4
Number placed successfully!

> check
Puzzle not yet complete. Keep going!
Moves made: 1

KenKen

> kenken easy

==================================================
KENKEN - EASY MODE
==================================================
KENKEN RULES:
- Fill 4x4 grid with 1-4
- No repeats in rows or columns
- Satisfy cage arithmetic constraints
- Operations: + - * /
==================================================

  | 1  | 2  | 3  | 4  |
  +----+----+----+----+
1 | .8+| .  | .3 | .2 |
  +----+----+----+----+
2 | .  | .6+| .  | .3-|
  +----+----+----+----+
3 | .2 | .6+| .8+| .  |
  +----+----+----+----+
4 | .  | .  | .  | .  |
  +----+----+----+----+

Cages:
  8+: (1,1), (1,2), (2,1)
  3: (1,3)
  2: (1,4)
  ...

> place 1 3 3
Number placed successfully!

Architecture

This server is built on the chuk-protocol-server framework, which provides:

  • Multiple transport protocol support (Telnet, TCP, WebSocket, WS-Telnet)
  • Telnet protocol negotiation (IAC, WILL, WONT, DO, DONT)
  • WebSocket handling with ping/pong keepalive
  • Connection management and monitoring
  • Asynchronous I/O with Python asyncio

Game Architecture

Each game is a self-contained module with all logic co-located:

games/
โ”œโ”€โ”€ _base/              # Base classes
โ”‚   โ”œโ”€โ”€ game.py         # PuzzleGame ABC
โ”‚   โ””โ”€โ”€ commands.py     # GameCommandHandler ABC
โ”œโ”€โ”€ sudoku/
โ”‚   โ”œโ”€โ”€ __init__.py     # Exports SudokuGame
โ”‚   โ”œโ”€โ”€ game.py         # Game logic
โ”‚   โ”œโ”€โ”€ config.py       # SudokuConfig
โ”‚   โ””โ”€โ”€ commands.py     # Command handler
โ”œโ”€โ”€ minesweeper/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ game.py
โ”‚   โ””โ”€โ”€ config.py
โ””โ”€โ”€ ... (24 games total)

All games extend the PuzzleGame abstract base class with deterministic seeding:

from puzzle_arcade_server.games._base import PuzzleGame

class PuzzleGame(ABC):
    def __init__(self, difficulty: str = "easy", seed: int | None = None):
        self.seed = seed if seed is not None else random.randint(0, 2**32 - 1)
        self._rng = random.Random(self.seed)  # Deterministic RNG
        # ...

    @property
    @abstractmethod
    def name(self) -> str: ...

    @property
    @abstractmethod
    def constraint_types(self) -> list[str]: ...

    @property
    @abstractmethod
    def business_analogies(self) -> list[str]: ...

    @abstractmethod
    async def generate_puzzle(self) -> None: ...

    @abstractmethod
    async def validate_move(self, *args) -> MoveResult: ...

    @abstractmethod
    def is_complete(self) -> bool: ...

    @abstractmethod
    def render_grid(self) -> str: ...

Handler Architecture

The ArcadeHandler class manages:

  • Menu-driven game selection
  • Command parsing and routing (delegating to game-specific handlers)
  • Grid display with proper formatting
  • Game state management per connection
  • Multi-game support

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/chrishayuk/puzzle-arcade-server.git
cd puzzle-arcade-server

# Install development dependencies (with UV)
make dev-install

# Or with pip
pip install -e ".[dev]"

Testing

The project has comprehensive test coverage (94%, 1067 tests):

# Run all tests
make test

# Run tests with coverage report
make test-cov

# Run tests in watch mode
make test-watch

# View coverage report in browser
make serve-coverage

Coverage by Module

src/puzzle_arcade_server/games/_base/             86%   # Base classes (abstract defaults)
src/puzzle_arcade_server/games/sudoku/            92%   # Sudoku module
src/puzzle_arcade_server/games/kenken/            90%   # KenKen module
src/puzzle_arcade_server/games/minesweeper/       96%   # Minesweeper module
src/puzzle_arcade_server/games/sokoban/           83%   # Sokoban (complex pathfinding)
src/puzzle_arcade_server/games/.../               90%+  # All other games
src/puzzle_arcade_server/gym_env.py               90%   # Gymnasium environment
src/puzzle_arcade_server/models/                  90%+  # Pydantic models
------------------------------------------------------
TOTAL                                              94%  ๐ŸŽฏ

Most modules meet the 90%+ coverage threshold. The remaining gaps are in abstract base class defaults and complex pathfinding algorithms.

Code Quality

The project follows modern Python best practices with a 9.8/10 compliance score:

Tooling

  • Ruff: Fast linter and formatter (replaces black + flake8)
  • MyPy: Static type checking
  • Pytest: Testing framework with async support
  • Bandit: Security vulnerability scanning

Code Standards

  • โœ… Pydantic v2 Native (10/10) - All models use ConfigDict, zero deprecation warnings
  • โœ… Async Native (9.5/10) - All I/O operations use async/await properly
  • โœ… Type-Safe (10/10) - No dict["key"] patterns, only typed Pydantic models
  • โœ… No Magic Strings (10/10) - All constants use enums or typed constants
  • โœ… Test Coverage (9.5/10) - 94% overall, most files โ‰ฅ90%

Quality Metrics

  • 1067 tests - All passing โœ…
  • 94% coverage - Exceeds 90% threshold โœ…
  • Zero linting errors - Clean codebase โœ…
  • Full type safety - MyPy passes โœ…
  • Deterministic seeding - Reproducible puzzles โœ…
# Run all checks (lint + typecheck + test + security)
make check

# Run linter
make lint

# Format code
make format

# Type checking
make typecheck

# Security scanning
make security

Running Example Clients

# Telnet client examples
make example-telnet              # Browse all games
make example-telnet-sudoku       # Sudoku demo
make example-telnet-kenken       # KenKen demo
make example-telnet-interactive  # Interactive mode

# WebSocket client examples
make example-ws                  # Tour all games
make example-ws-sudoku           # Sudoku demo
make example-ws-binary           # Binary puzzle demo
make example-ws-solve            # Solve with hints
make example-ws-interactive      # Interactive mode

CI/CD

The project includes GitHub Actions workflows:

  • test.yml: Runs tests on Ubuntu, Windows, macOS with Python 3.11, 3.12, 3.13
  • publish.yml: Publishes to PyPI on release
  • release.yml: Creates GitHub releases
  • fly-deploy.yml: Auto-deploys to Fly.io on main branch push

Coverage threshold is set to 90% - builds fail if coverage drops below this.

Deployment to Fly.io

Using Make (Recommended)

# Deploy to Fly.io
make fly-deploy

# Check status
make fly-status

# View logs
make fly-logs

Manual Deployment

  1. Install the Fly CLI: https://fly.io/docs/hands-on/install-flyctl/

  2. Login to Fly:

fly auth login
  1. Create and deploy the app:
# First deployment (creates the app)
fly launch --config fly.toml --now

# Subsequent deployments
fly deploy
  1. Important: Allocate a public IPv6 address for TCP services:
# Allocate IPv6 (free)
fly ips allocate-v6

# Verify IP is allocated
fly ips list
  1. Check the status:
fly status
  1. View logs:
fly logs
  1. Connect to your Puzzle Arcade server:
# Get your app's IPv6 address
fly ips list

# Connect via telnet using IPv6 (free tier)
telnet <your-ipv6> 8023

# WebSocket connections work with hostname
# ws://<your-app>.fly.dev:8025/ws

Note: TCP services (Telnet, raw TCP) require a public IP address on Fly.io. We use IPv6 which is free. IPv4 costs $2/month and is not needed for most users.

Project Structure

puzzle-arcade-server/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ puzzle_arcade_server/
โ”‚       โ”œโ”€โ”€ __init__.py           # Package initialization
โ”‚       โ”œโ”€โ”€ server.py             # Main arcade handler
โ”‚       โ”œโ”€โ”€ constants.py          # Game constants
โ”‚       โ”œโ”€โ”€ models/               # Pydantic models
โ”‚       โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚       โ”‚   โ”œโ”€โ”€ base.py           # GridPosition, MoveResult
โ”‚       โ”‚   โ”œโ”€โ”€ config.py         # Base GameConfig
โ”‚       โ”‚   โ”œโ”€โ”€ enums.py          # DifficultyLevel, GameCommand, etc.
โ”‚       โ”‚   โ””โ”€โ”€ games.py          # Game-specific models (Cage, Task, etc.)
โ”‚       โ””โ”€โ”€ games/                # Self-contained game modules
โ”‚           โ”œโ”€โ”€ __init__.py       # AVAILABLE_GAMES registry
โ”‚           โ”œโ”€โ”€ _base/            # Base classes
โ”‚           โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚           โ”‚   โ”œโ”€โ”€ game.py       # PuzzleGame ABC
โ”‚           โ”‚   โ””โ”€โ”€ commands.py   # GameCommandHandler ABC
โ”‚           โ”œโ”€โ”€ sudoku/           # Example game module
โ”‚           โ”‚   โ”œโ”€โ”€ __init__.py   # Exports SudokuGame
โ”‚           โ”‚   โ”œโ”€โ”€ game.py       # SudokuGame class
โ”‚           โ”‚   โ”œโ”€โ”€ config.py     # SudokuConfig
โ”‚           โ”‚   โ””โ”€โ”€ commands.py   # SudokuCommandHandler
โ”‚           โ”œโ”€โ”€ minesweeper/      # Each game is self-contained
โ”‚           โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚           โ”‚   โ”œโ”€โ”€ game.py
โ”‚           โ”‚   โ””โ”€โ”€ config.py
โ”‚           โ””โ”€โ”€ ... (24 games total)
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_puzzle_game.py       # Base class tests
โ”‚   โ”œโ”€โ”€ test_deterministic_seeding.py  # Seeding tests
โ”‚   โ”œโ”€โ”€ test_sudoku_game.py       # Sudoku tests
โ”‚   โ”œโ”€โ”€ test_minesweeper.py       # Minesweeper tests
โ”‚   โ””โ”€โ”€ ... (tests for all 24 games)
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ simple_client.py          # Telnet client example
โ”‚   โ”œโ”€โ”€ websocket_client.py       # WebSocket client example
โ”‚   โ””โ”€โ”€ README.md                 # Example usage guide
โ”œโ”€โ”€ .github/workflows/            # CI/CD workflows
โ”œโ”€โ”€ pyproject.toml                # Modern Python project config
โ”œโ”€โ”€ config.yaml                   # Multi-transport server configuration
โ”œโ”€โ”€ Dockerfile                    # Docker build instructions
โ”œโ”€โ”€ fly.toml                      # Fly.io deployment config
โ”œโ”€โ”€ Makefile                      # Development commands (50+ targets)
โ””โ”€โ”€ README.md                     # This file

Key Statistics

  • Test Coverage: 94% overall (1067 tests, all passing)
  • Code Quality Score: 9.8/10 (near perfect compliance)
  • Games Implemented: 24 complete puzzle types
    • 7 Classic Logic Puzzles
    • 7 Advanced CP-SAT Puzzles
    • 5 Specialized Constraint Puzzles
    • 2 Optimization Challenges
    • 3 Advanced Reasoning Puzzles
  • Supported Transports: 4 (Telnet, TCP, WebSocket, WS-Telnet)
  • Agent-Friendly Mode: Structured output for AI tools
  • Gymnasium API: RL-compatible environment for all games
  • Deterministic Seeding: Reproducible puzzles for testing

Use Cases

1. LLM Reasoning Demonstration

Perfect for demonstrating LLM reasoning capabilities:

  1. LLM connects via telnet: telnet localhost 8023
  2. Selects a puzzle: sudoku hard
  3. Receives puzzle in clean ASCII format
  4. Analyzes constraints and generates solution
  5. Submits moves: place 1 5 7
  6. Server validates each move
  7. Puzzle solved! Proof of reasoning capability

2. Constraint Solver Testing

Test the generality of constraint solvers (like MCP solvers):

  • Different puzzle types โ†’ Same underlying solver
  • Clean ASCII output โ†’ Easy for solver parsing
  • Simple interface โ†’ Focus on solving, not UI
  • Pure validation โ†’ Server validates, doesn't solve

3. Educational Tool

Learn about constraint satisfaction problems:

  • 24 different puzzle types demonstrating various constraint types:
    • AllDifferent constraints (Sudoku, KenKen, Futoshiki)
    • Arithmetic constraints (KenKen, Kakuro, Killer Sudoku)
    • Boolean/SAT constraints (Lights Out, Binary Puzzle)
    • Loop/Edge constraints (Slitherlink)
    • Deduction constraints (Mastermind, Logic Grid, Einstein's Puzzle)
    • Optimization objectives (Knapsack, Task Scheduler)
    • Temporal reasoning (Task Scheduler)
    • Connectivity constraints (Nurikabe, Slitherlink)
    • Probabilistic reasoning (Minesweeper)
    • And more!
  • Well-documented code showing puzzle generation algorithms
  • Comprehensive tests (1067 tests, 94% coverage) demonstrating validation
  • Deterministic seeding - Reproduce any puzzle for debugging/testing
  • Production-ready - 9.8/10 code quality score
  • Type-safe - Full Pydantic v2 and MyPy compliance
  • Modular architecture - Each game is self-contained in its own folder

Adding New Puzzle Games

  1. Create a new game folder in src/puzzle_arcade_server/games/:
games/
โ””โ”€โ”€ my_puzzle/
    โ”œโ”€โ”€ __init__.py     # Export the game class
    โ”œโ”€โ”€ game.py         # Game logic
    โ””โ”€โ”€ config.py       # Game configuration
  1. Create the config in config.py:
from pydantic import Field
from ...models import DifficultyLevel, GameConfig

class MyPuzzleConfig(GameConfig):
    grid_size: int = Field(default=5, description="Grid size")

    @classmethod
    def from_difficulty(cls, difficulty: DifficultyLevel) -> "MyPuzzleConfig":
        sizes = {DifficultyLevel.EASY: 5, DifficultyLevel.MEDIUM: 7, DifficultyLevel.HARD: 9}
        return cls(difficulty=difficulty, grid_size=sizes[difficulty])
  1. Create the game in game.py:
from .._base import PuzzleGame
from ...models import MoveResult
from .config import MyPuzzleConfig

class MyPuzzleGame(PuzzleGame):
    def __init__(self, difficulty: str = "easy", seed: int | None = None):
        super().__init__(difficulty, seed)
        self.config = MyPuzzleConfig.from_difficulty(self.difficulty)
        # Use self._rng for all randomness (deterministic seeding)

    @property
    def name(self) -> str:
        return "My Puzzle"

    @property
    def constraint_types(self) -> list[str]:
        return ["all_different", "sum_constraint"]

    @property
    def business_analogies(self) -> list[str]:
        return ["resource_allocation", "scheduling"]

    async def generate_puzzle(self) -> None:
        # Use self._rng.randint(), self._rng.choice(), etc.
        self.game_started = True

    async def validate_move(self, row: int, col: int, num: int) -> MoveResult:
        # Validate and apply move
        return MoveResult(success=True, message="Number placed!")

    def is_complete(self) -> bool:
        return all(cell != 0 for row in self.grid for cell in row)

    def render_grid(self) -> str:
        return "  | 1 | 2 | 3 |\n" + ...

    def get_stats(self) -> str:
        return f"Moves: {self.moves_made} | Seed: {self.seed}"
  1. Export in __init__.py:
from .game import MyPuzzleGame
__all__ = ["MyPuzzleGame"]
  1. Register in src/puzzle_arcade_server/games/__init__.py:
from .my_puzzle import MyPuzzleGame

AVAILABLE_GAMES = {
    # ... other games
    "mypuzzle": MyPuzzleGame,
}
  1. Add tests in tests/test_my_puzzle_game.py:
from puzzle_arcade_server.games.my_puzzle import MyPuzzleGame

class TestMyPuzzleGame:
    async def test_deterministic_seeding(self):
        game1 = MyPuzzleGame("easy", seed=12345)
        game2 = MyPuzzleGame("easy", seed=12345)
        await game1.generate_puzzle()
        await game2.generate_puzzle()
        assert game1.render_grid() == game2.render_grid()

    def test_seed_in_stats(self):
        game = MyPuzzleGame("easy", seed=42)
        assert "Seed: 42" in game.get_stats()
  1. Run tests and verify:
make test-cov
make check

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-puzzle)
  3. Make your changes
  4. Run tests and checks (make check)
  5. Ensure coverage stays above 90% (make test-cov)
  6. Commit your changes (git commit -m 'Add amazing puzzle')
  7. Push to the branch (git push origin feature/amazing-puzzle)
  8. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guide (enforced by ruff)
  • Add type hints to all functions
  • Write tests for new features (>90% coverage)
  • Update documentation as needed
  • Ensure all grid headers align properly with rows

Troubleshooting

Server won't start

  • Ensure chuk-protocol-server is installed: uv pip install chuk-protocol-server
  • Check ports aren't already in use: lsof -i :8023,8024,8025,8026
  • Verify Python version is 3.11+: python --version

Tests failing

  • Install dev dependencies: make dev-install
  • Clear cache: make clean
  • Check Python version compatibility

Coverage too low

  • Run coverage report: make test-cov
  • View HTML report: make serve-coverage
  • Add tests for uncovered code

Grid alignment issues

  • All grid headers must align with row pipes
  • Use the format " |" for headers to match row format "N |"
  • Test visually: make example-telnet-kenken

Roadmap

See ROADMAP.md for the full development roadmap.

Highlights

Benchmarking & Metrics

  • Puzzle complexity metrics (constraint count, variable count, branching factor)
  • Episode model for tracking game sessions
  • Trace logging for offline analysis

Agent Evaluation Tools

  • Batch evaluation harness CLI
  • Solver vs Model comparison mode
  • JSON protocol for structured agent communication

Learning & Curriculum

  • Constraint concept progression graph
  • Tagged puzzle sets for educators
  • Difficulty scaling based on constraint complexity

Ecosystem Integrations

  • MCP native mode for agent frameworks
  • Python client library
  • REST/WebSocket API documentation

UX & Community

  • Interactive web viewer with replay mode
  • Public benchmark packs (versioned, citable)
  • Community leaderboards

License

MIT License - see the main chuk-protocol-server project for details.

Credits

  • Built using the chuk-protocol-server framework
  • Puzzle generation algorithms based on backtracking and constraint propagation
  • Uses modern Python tooling: UV, Ruff, MyPy, Pytest

Links


Ready to test your solver? Connect now and start solving! ๐ŸŽฎ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puzzle_arcade_server-0.7.tar.gz (207.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puzzle_arcade_server-0.7-py3-none-any.whl (180.8 kB view details)

Uploaded Python 3

File details

Details for the file puzzle_arcade_server-0.7.tar.gz.

File metadata

  • Download URL: puzzle_arcade_server-0.7.tar.gz
  • Upload date:
  • Size: 207.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for puzzle_arcade_server-0.7.tar.gz
Algorithm Hash digest
SHA256 039682b5f347463f2f23ed54d61bfe9a1a51146e6ce0807a1df9abe48ee6baf3
MD5 3cd1df816d232f3ec347d02bea7d9949
BLAKE2b-256 c36306fcdde49cbb94a41998984f91ac51b366995d59d74b46a8ced2ae74207a

See more details on using hashes here.

File details

Details for the file puzzle_arcade_server-0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for puzzle_arcade_server-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2f4a170a0b891597baf14a7fecf52ba9edf002d3af188b302e684543c2522685
MD5 4d2d064aa285d25ac375153584435b94
BLAKE2b-256 2411643f4ad1575db1127a16c8d59560dbc2b559a8550a9fd6e569f48b054866

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page