Skip to main content

AlphaZero implementation for a triangle puzzle game (uses trianglengin).

Project description

CI/CD Status - codecov - PyPI versionLicense: MIT - Python Version

AlphaTriangle

AlphaTriangle Logo

Overview

AlphaTriangle is a project implementing an artificial intelligence agent based on AlphaZero principles to learn and play a custom puzzle game involving placing triangular shapes onto a grid. The agent learns through headless self-play reinforcement learning, guided by Monte Carlo Tree Search (MCTS) and a deep neural network (PyTorch). It uses the trianglengin library for core game logic.

The project includes:

  • An implementation of the MCTS algorithm tailored for the game.
  • A deep neural network (policy and value heads) implemented in PyTorch, featuring convolutional layers and optional Transformer Encoder layers.
  • A reinforcement learning pipeline coordinating parallel self-play (using Ray), data storage, and network training, managed by the alphatriangle.training module.
  • Experiment tracking and visualization using MLflow and TensorBoard.
  • Unit tests for RL components.
  • A command-line interface for running the headless training pipeline.

๐ŸŽฎ The Triangle Puzzle Game Guide ๐Ÿงฉ

This project trains an agent to play the game defined by the trianglengin library. Here's a detailed explanation of the game rules:

1. Introduction: Your Mission! ๐ŸŽฏ

The goal is to place colorful shapes onto a special triangular grid. By filling up lines of triangles, you make them disappear and score points! Keep placing shapes and clearing lines for as long as possible to get the highest score before the grid fills up and you run out of moves.

2. The Playing Field: The Grid ๐Ÿ—บ๏ธ

  • Triangle Cells: The game board is a grid made of many small triangles. Some point UP (๐Ÿ”บ) and some point DOWN (๐Ÿ”ป). They alternate like a checkerboard pattern based on their row and column index (specifically, (row + col) % 2 != 0 means UP).
  • Shape: The grid itself is rectangular overall, but the playable area within it is typically shaped like a triangle or hexagon, wider in the middle and narrower at the top and bottom.
  • Playable Area: You can only place shapes within the designated playable area.
  • Death Zones ๐Ÿ’€: Around the edges of the playable area (often at the start and end of rows), some triangles are marked as "Death Zones". You cannot place any part of a shape onto these triangles. They are off-limits! Think of them as the boundaries within the rectangular grid.

3. Your Tools: The Shapes ๐ŸŸฆ๐ŸŸฅ๐ŸŸฉ

  • Shape Formation: Each shape is a collection of connected small triangles (๐Ÿ”บ and ๐Ÿ”ป). They come in different colors and arrangements. Some might be a single triangle, others might be long lines, L-shapes, or more complex patterns.
  • Relative Positions: The triangles within a shape have fixed positions relative to each other. When you move the shape, all its triangles move together as one block.
  • Preview Area: You will always have three shapes available to choose from at any time. These are shown in a special "preview area".

4. Making Your Move: Placing Shapes ๐Ÿ–ฑ๏ธโžก๏ธโ–ฆ

This is the core action! Here's exactly how to place a shape:

  • Step 4a: Select a Shape: Choose one of the three shapes available in the preview area.
  • Step 4b: Aim on the Grid: Select a target coordinate (row, col) on the main grid. This coordinate represents the anchor point for placing the shape.
  • Step 4c: The Placement Rules (MUST Follow!)
    • ๐Ÿ“ Rule 1: Fit Inside Playable Area: ALL triangles of your chosen shape must land within the playable grid area. No part of the shape can land in a Death Zone ๐Ÿ’€.
    • ๐Ÿงฑ Rule 2: No Overlap: ALL triangles of your chosen shape must land on currently empty spaces on the grid. You cannot place a shape on top of triangles that are already filled with color from previous shapes.
    • ๐Ÿ“ Rule 3: Orientation Match! This is crucial!
      • If a part of your shape is an UP triangle (๐Ÿ”บ), it MUST land on an UP space (๐Ÿ”บ) on the grid.
      • If a part of your shape is a DOWN triangle (๐Ÿ”ป), it MUST land on a DOWN space (๐Ÿ”ป) on the grid.
      • ๐Ÿ”บโžก๏ธ๐Ÿ”บ (OK!)
      • ๐Ÿ”ปโžก๏ธ๐Ÿ”ป (OK!)
      • ๐Ÿ”บโžก๏ธ๐Ÿ”ป (INVALID! โŒ)
      • ๐Ÿ”ปโžก๏ธ๐Ÿ”บ (INVALID! โŒ)
  • Step 4d: Confirm Placement: If the chosen shape can be placed at the target coordinate according to ALL three rules, the placement is valid. The shape is now placed permanently on the grid! โœจ

5. Scoring Points: How You Win! ๐Ÿ†

You score points in two main ways:

  • Placing Triangles: You get a small number of points for every single small triangle that makes up the shape you just placed. (e.g., placing a 3-triangle shape might give you 3 * tiny_score points).
  • Clearing Lines: This is where the BIG points come from! You get a much larger number of points for every single small triangle that disappears when you clear a line (or multiple lines at once!). See the next section for details!

6. Line Clearing Magic! โœจ (The Key to High Scores!)

This is the most exciting part! When you place a shape, the game immediately checks if you've completed any lines. This section explains how the game finds and clears these lines.

  • What Lines Can Be Cleared? There are three types of lines the game looks for:

    • Horizontal Lines โ†”๏ธ: A straight, unbroken line of filled triangles going across a single row.
    • Diagonal Lines (Top-Left to Bottom-Right) โ†˜๏ธ: An unbroken diagonal line of filled triangles stepping down and to the right.
    • Diagonal Lines (Bottom-Left to Top-Right) โ†—๏ธ: An unbroken diagonal line of filled triangles stepping up and to the right.
  • How Lines are Found: Pre-calculation of Maximal Lines

    • The Idea: Instead of checking every possible line combination all the time, the game pre-calculates all maximal continuous lines of playable triangles when it starts. A maximal line is the longest possible straight segment of playable triangles (not in a Death Zone) in one of the three directions (Horizontal, Diagonal โ†˜๏ธ, Diagonal โ†—๏ธ).
    • Tracing: For every playable triangle on the grid, the game traces outwards in each of the three directions to find the full extent of the continuous playable line passing through that triangle in that direction.
    • Storing Maximal Lines: Only the complete maximal lines found are stored. For example, if tracing finds a playable sequence A-B-C-D, only the line (A,B,C,D) is stored, not the sub-segments like (A,B,C) or (B,C,D). These maximal lines represent the potential lines that can be cleared.
    • Coordinate Map: The game also builds a map linking each playable triangle coordinate (r, c) to the set of maximal lines it belongs to. This allows for quick lookup.
  • Defining the Paths (Neighbor Logic): How does the game know which triangle is "next" when tracing? It depends on the current triangle's orientation (๐Ÿ”บ or ๐Ÿ”ป) and the direction being traced:

    • Horizontal โ†”๏ธ:
      • Left Neighbor: (r, c-1) (Always in the same row)
      • Right Neighbor: (r, c+1) (Always in the same row)
    • Diagonal โ†˜๏ธ (TL-BR):
      • If current is ๐Ÿ”บ (Up): Next is (r+1, c) (Down triangle directly below)
      • If current is ๐Ÿ”ป (Down): Next is (r, c+1) (Up triangle to the right)
    • Diagonal โ†—๏ธ (BL-TR):
      • If current is ๐Ÿ”ป (Down): Next is (r-1, c) (Up triangle directly above)
      • If current is ๐Ÿ”บ (Up): Next is (r, c+1) (Down triangle to the right)
  • Visualizing the Paths:

    • Horizontal โ†”๏ธ:
      ... [๐Ÿ”ป][๐Ÿ”บ][๐Ÿ”ป][๐Ÿ”บ][๐Ÿ”ป][๐Ÿ”บ] ...  (Moves left/right in the same row)
      
    • Diagonal โ†˜๏ธ (TL-BR): (Connects via shared horizontal edges)
      ...[๐Ÿ”บ]...
      ...[๐Ÿ”ป][๐Ÿ”บ] ...
      ...     [๐Ÿ”ป][๐Ÿ”บ] ...
      ...         [๐Ÿ”ป] ...
      (Path alternates row/col increments depending on orientation)
      
    • Diagonal โ†—๏ธ (BL-TR): (Connects via shared horizontal edges)
      ...           [๐Ÿ”บ]  ...
      ...      [๐Ÿ”บ][๐Ÿ”ป]   ...
      ... [๐Ÿ”บ][๐Ÿ”ป]        ...
      ... [๐Ÿ”ป]            ...
      (Path alternates row/col increments depending on orientation)
      
  • The "Full Line" Rule: After you place a piece, the game looks at the coordinates (r, c) of the triangles you just placed. Using the pre-calculated map, it finds all the maximal lines that contain any of those coordinates. For each of those maximal lines (that have at least 2 triangles), it checks: "Is every single triangle coordinate in this maximal line now occupied?" If yes, that line is complete! (Note: Single isolated triangles don't count as clearable lines).

  • The Poof! ๐Ÿ’จ:

    • If placing your shape completes one or MORE maximal lines (of any type, length >= 2) simultaneously, all the triangles in ALL completed lines vanish instantly!
    • The spaces become empty again.
    • You score points for every single triangle that vanished. Clearing multiple lines at once is the best way to rack up points! ๐Ÿฅณ

7. Getting New Shapes: The Refill ๐Ÿช„

  • The Trigger: The game only gives you new shapes when a specific condition is met.
  • The Condition: New shapes appear only when all three of your preview slots become empty at the exact same time.
  • How it Happens: This usually occurs right after you place your last available shape (the third one).
  • The Refill: As soon as the third slot becomes empty, BAM! ๐Ÿช„ Three brand new, randomly generated shapes instantly appear in the preview slots.
  • Important: If you place a shape and only one or two slots are empty, you do not get new shapes yet. You must use up all three before the refill happens.

8. The End of the Road: Game Over ๐Ÿ˜ญ

So, how does the game end?

  • The Condition: The game is over when you cannot legally place any of the three shapes currently available in your preview slots anywhere on the grid.
  • The Check: After every move (placing a shape and any resulting line clears), and after any potential shape refill, the game checks: "Is there at least one valid spot on the grid for Shape 1? OR for Shape 2? OR for Shape 3?"
  • No More Moves: If the answer is "NO" for all three shapes (meaning none of them can be placed anywhere according to the Placement Rules), then the game immediately ends.
  • Strategy: This means you need to be careful! Don't fill up the grid in a way that leaves no room for the types of shapes you might get later. Always try to keep options open! ๐Ÿค”

Core Technologies

  • Python 3.10+
  • trianglengin: Core game engine (state, actions, rules).
  • PyTorch: For the deep learning model (CNNs, optional Transformers, Distributional Value Head) and training, with CUDA/MPS support.
  • NumPy: For numerical operations, especially state representation (used by trianglengin and features).
  • Ray: For parallelizing self-play data generation and statistics collection across multiple CPU cores/processes.
  • Numba: (Optional, used in features.grid_features) For performance optimization of specific grid calculations.
  • Cloudpickle: For serializing the experience replay buffer and training checkpoints.
  • MLflow: For logging parameters, metrics, and artifacts (checkpoints, buffers) during training runs. Provides the primary web UI dashboard for experiment management.
  • TensorBoard: For visualizing metrics during training (e.g., detailed loss curves). Provides a secondary web UI dashboard, often with faster graph updates.
  • Pydantic: For configuration management and data validation.
  • Typer: For the command-line interface.
  • Pytest: For running unit tests.

Project Structure

.
โ”œโ”€โ”€ .github/workflows/      # GitHub Actions CI/CD
โ”‚   โ””โ”€โ”€ ci_cd.yml
โ”œโ”€โ”€ .alphatriangle_data/    # Root directory for ALL persistent data (GITIGNORED)
โ”‚   โ”œโ”€โ”€ mlruns/             # MLflow internal tracking data & artifact store (for UI)
โ”‚   โ””โ”€โ”€ runs/               # Local artifacts per run (checkpoints, buffers, TB logs, configs)
โ”‚       โ””โ”€โ”€ <run_name>/
โ”‚           โ”œโ”€โ”€ checkpoints/ # Saved model weights & optimizer states
โ”‚           โ”œโ”€โ”€ buffers/     # Saved experience replay buffers
โ”‚           โ”œโ”€โ”€ logs/        # Plain text log files for the run
โ”‚           โ”œโ”€โ”€ tensorboard/ # TensorBoard log files (scalars, etc.)
โ”‚           โ””โ”€โ”€ configs.json # Copy of run configuration
โ”œโ”€โ”€ alphatriangle/          # Source code for the AlphaZero agent package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cli.py              # CLI logic (train command - headless only)
โ”‚   โ”œโ”€โ”€ config/             # Pydantic configuration models (MCTS, Model, Train, Persistence)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ data/               # Data saving/loading logic (DataManager, Schemas)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ features/           # Feature extraction logic (operates on trianglengin.GameState)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ mcts/               # Monte Carlo Tree Search (operates on trianglengin.GameState)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ nn/                 # Neural network definition and wrapper
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ rl/                 # RL components (Trainer, Buffer, Worker)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ stats/              # Statistics collection actor (StatsCollectorActor)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ training/           # Training orchestration (Loop, Setup, Runner)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ””โ”€โ”€ utils/              # Shared utilities and types (specific to AlphaTriangle)
โ”‚       โ””โ”€โ”€ README.md
โ”œโ”€โ”€ tests/                  # Unit tests (for alphatriangle components)
โ”‚   โ”œโ”€โ”€ conftest.py
โ”‚   โ”œโ”€โ”€ mcts/
โ”‚   โ”œโ”€โ”€ nn/
โ”‚   โ”œโ”€โ”€ rl/
โ”‚   โ”œโ”€โ”€ stats/
โ”‚   โ””โ”€โ”€ training/
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .python-version
โ”œโ”€โ”€ LICENSE                 # License file (MIT)
โ”œโ”€โ”€ MANIFEST.in             # Specifies files for source distribution
โ”œโ”€โ”€ pyproject.toml          # Build system & package configuration (depends on trianglengin)
โ”œโ”€โ”€ README.md               # This file
โ””โ”€โ”€ requirements.txt        # List of dependencies (includes trianglengin)

Key Modules (alphatriangle)

  • cli: Defines the command-line interface using Typer (only train command, headless). (alphatriangle/cli.py)
  • config: Centralized Pydantic configuration classes (excluding EnvConfig and DisplayConfig). (alphatriangle/config/README.md)
  • features: Contains logic to convert trianglengin.GameState objects into numerical features (StateType). (alphatriangle/features/README.md)
  • nn: Contains the PyTorch nn.Module definition (AlphaTriangleNet) and a wrapper class (NeuralNetwork). (alphatriangle/nn/README.md)
  • mcts: Implements the Monte Carlo Tree Search algorithm (Node, run_mcts_simulations), operating on trianglengin.GameState. (alphatriangle/mcts/README.md)
  • rl: Contains RL components: Trainer (network updates), ExperienceBuffer (data storage, supports PER), and SelfPlayWorker (Ray actor for parallel self-play using trianglengin.GameState). (alphatriangle/rl/README.md)
  • training: Orchestrates the headless training process using TrainingLoop, managing workers, data flow, logging (to console, file, MLflow, TensorBoard), and checkpoints. Includes runner.py for the callable training function. (alphatriangle/training/README.md)
  • stats: Contains the StatsCollectorActor (Ray actor) for asynchronous statistics collection. (alphatriangle/stats/README.md)
  • data: Manages saving and loading of training artifacts (DataManager) using Pydantic schemas and cloudpickle. (alphatriangle/data/README.md)
  • utils: Provides common helper functions and shared type definitions specific to the AlphaZero implementation. (alphatriangle/utils/README.md)

Setup

  1. Clone the repository (for development):
    git clone https://github.com/lguibr/alphatriangle.git
    cd alphatriangle
    
  2. Create a virtual environment (recommended):
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install the package (including trianglengin):
    • For users:
      # This will automatically install trianglengin from PyPI if available
      pip install alphatriangle
      # Or install directly from Git (installs trianglengin from PyPI)
      # pip install git+https://github.com/lguibr/alphatriangle.git
      
    • For developers (editable install):
      • First, ensure trianglengin is installed (ideally in editable mode from its own directory if developing both):
        # From the trianglengin directory:
        # pip install -e .
        
      • Then, install alphatriangle in editable mode:
        # From the alphatriangle directory:
        pip install -e .
        # Install dev dependencies (optional, for running tests/linting)
        pip install -e .[dev] # Installs dev deps from pyproject.toml
        
    Note: Ensure you have the correct PyTorch version installed for your system (CPU/CUDA/MPS). See pytorch.org. Ray may have specific system requirements.
  4. (Optional) Add data directory to .gitignore: Create or edit the .gitignore file in your project root and add the line:
    .alphatriangle_data/
    

Running the Code (CLI)

Use the alphatriangle command for training:

  • Show Help:
    alphatriangle --help
    
  • Run Training (Headless Only):
    alphatriangle train [--seed 42] [--log-level INFO]
    
  • Interactive Play/Debug (Use trianglengin CLI): Note: Interactive modes are part of the trianglengin library, not this alphatriangle package.
    # Ensure trianglengin is installed
    trianglengin play [--seed 42] [--log-level INFO]
    trianglengin debug [--seed 42] [--log-level DEBUG]
    
  • Monitoring Training (Web Dashboards): This project uses MLflow and TensorBoard to provide web-based dashboards for monitoring. It's recommended to run both concurrently for the best experience:
    • MLflow UI (Experiment Overview & Artifacts): Provides the main dashboard for comparing runs, viewing parameters, high-level metrics, and accessing saved artifacts (checkpoints, buffers). Updates occur as data is logged, but may require a browser refresh for the latest points.
      # Run from the project root directory
      mlflow ui --backend-store-uri file:./.alphatriangle_data/mlruns
      
      Access via http://localhost:5000.
    • TensorBoard (Near Real-Time Graphs): Offers more frequently updated graphs of scalar metrics (losses, rates, etc.) during a run, making it ideal for closely monitoring training progress.
      # Run from the project root directory, pointing to the *specific run's* TB logs
      tensorboard --logdir .alphatriangle_data/runs/<your_run_name>/tensorboard
      # Replace <your_run_name> with the actual name (e.g., train_20240101_120000)
      # You can also point to the parent 'runs' directory to see all runs:
      # tensorboard --logdir .alphatriangle_data/runs
      
      Access via http://localhost:6006.
  • Running Unit Tests (Development):
    pytest tests/
    

Configuration

All major parameters for the AlphaZero agent (MCTS, Model, Training, Persistence) are defined in the Pydantic classes within the alphatriangle/config/ directory. Modify these files to experiment with different settings. Environment configuration (EnvConfig) is defined within the trianglengin library.

Data Storage

All persistent data is stored within the .alphatriangle_data/ directory in the project root.

  • .alphatriangle_data/mlruns/: Managed by MLflow. Contains MLflow's internal tracking data (parameters, metrics) and its own copy of logged artifacts. This is the source for the MLflow UI.
  • .alphatriangle_data/runs/: Managed by DataManager. Contains locally saved artifacts for each run (checkpoints, buffers, TensorBoard logs, configs) before/during logging to MLflow. This directory is used for auto-resuming and direct access to TensorBoard logs during a run.

Maintainability

This project includes README files within each major alphatriangle submodule. Please keep these READMEs updated when making changes to the code's structure, interfaces, or core logic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphatriangle-1.0.0.tar.gz (117.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphatriangle-1.0.0-py3-none-any.whl (141.6 kB view details)

Uploaded Python 3

File details

Details for the file alphatriangle-1.0.0.tar.gz.

File metadata

  • Download URL: alphatriangle-1.0.0.tar.gz
  • Upload date:
  • Size: 117.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for alphatriangle-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b6c287b5c8e16540fe7be85aa8f15662d205d84e89c34e7ccb810ba5838d4114
MD5 d6e07995202404287acacbbdcdbbdf01
BLAKE2b-256 cf720e91a565fa6627920fa761fc62872f5e5cd031aaf784b3fd053f04d7d097

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphatriangle-1.0.0.tar.gz:

Publisher: ci_cd.yml on lguibr/alphatriangle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alphatriangle-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: alphatriangle-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 141.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for alphatriangle-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdbd275c97d9e9aecc07297fe2118da4efc2b22db0e5477d71712ef6c4903a46
MD5 9050662e25aefed635b7e20845ad22fa
BLAKE2b-256 021649409d6a8914fd05502bd6f3c16a13edb01f368a6d2e7bd69fca16188643

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphatriangle-1.0.0-py3-none-any.whl:

Publisher: ci_cd.yml on lguibr/alphatriangle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page