AlphaZero implementation for a triangle puzzle game (uses trianglengin).
Project description
AlphaTriangle
Overview
AlphaTriangle is a project implementing an artificial intelligence agent based on AlphaZero principles to learn and play a custom puzzle game involving placing triangular shapes onto a grid. The agent learns through headless self-play reinforcement learning, guided by Monte Carlo Tree Search (MCTS) and a deep neural network (PyTorch). It uses the trianglengin library for core game logic.
The project includes:
- An implementation of the MCTS algorithm tailored for the game.
- A deep neural network (policy and value heads) implemented in PyTorch, featuring convolutional layers and optional Transformer Encoder layers.
- A reinforcement learning pipeline coordinating parallel self-play (using Ray), data storage, and network training, managed by the
alphatriangle.trainingmodule. - Experiment tracking and visualization using MLflow and TensorBoard.
- Unit tests for RL components.
- A command-line interface for running the headless training pipeline.
๐ฎ The Triangle Puzzle Game Guide ๐งฉ
This project trains an agent to play the game defined by the trianglengin library. Here's a detailed explanation of the game rules:
1. Introduction: Your Mission! ๐ฏ
The goal is to place colorful shapes onto a special triangular grid. By filling up lines of triangles, you make them disappear and score points! Keep placing shapes and clearing lines for as long as possible to get the highest score before the grid fills up and you run out of moves.
2. The Playing Field: The Grid ๐บ๏ธ
- Triangle Cells: The game board is a grid made of many small triangles. Some point UP (๐บ) and some point DOWN (๐ป). They alternate like a checkerboard pattern based on their row and column index (specifically,
(row + col) % 2 != 0means UP). - Shape: The grid itself is rectangular overall, but the playable area within it is typically shaped like a triangle or hexagon, wider in the middle and narrower at the top and bottom.
- Playable Area: You can only place shapes within the designated playable area.
- Death Zones ๐: Around the edges of the playable area (often at the start and end of rows), some triangles are marked as "Death Zones". You cannot place any part of a shape onto these triangles. They are off-limits! Think of them as the boundaries within the rectangular grid.
3. Your Tools: The Shapes ๐ฆ๐ฅ๐ฉ
- Shape Formation: Each shape is a collection of connected small triangles (๐บ and ๐ป). They come in different colors and arrangements. Some might be a single triangle, others might be long lines, L-shapes, or more complex patterns.
- Relative Positions: The triangles within a shape have fixed positions relative to each other. When you move the shape, all its triangles move together as one block.
- Preview Area: You will always have three shapes available to choose from at any time. These are shown in a special "preview area".
4. Making Your Move: Placing Shapes ๐ฑ๏ธโก๏ธโฆ
This is the core action! Here's exactly how to place a shape:
- Step 4a: Select a Shape: Choose one of the three shapes available in the preview area.
- Step 4b: Aim on the Grid: Select a target coordinate
(row, col)on the main grid. This coordinate represents the anchor point for placing the shape. - Step 4c: The Placement Rules (MUST Follow!)
- ๐ Rule 1: Fit Inside Playable Area: ALL triangles of your chosen shape must land within the playable grid area. No part of the shape can land in a Death Zone ๐.
- ๐งฑ Rule 2: No Overlap: ALL triangles of your chosen shape must land on currently empty spaces on the grid. You cannot place a shape on top of triangles that are already filled with color from previous shapes.
- ๐ Rule 3: Orientation Match! This is crucial!
- If a part of your shape is an UP triangle (๐บ), it MUST land on an UP space (๐บ) on the grid.
- If a part of your shape is a DOWN triangle (๐ป), it MUST land on a DOWN space (๐ป) on the grid.
- ๐บโก๏ธ๐บ (OK!)
- ๐ปโก๏ธ๐ป (OK!)
- ๐บโก๏ธ๐ป (INVALID! โ)
- ๐ปโก๏ธ๐บ (INVALID! โ)
- Step 4d: Confirm Placement: If the chosen shape can be placed at the target coordinate according to ALL three rules, the placement is valid. The shape is now placed permanently on the grid! โจ
5. Scoring Points: How You Win! ๐
You score points in two main ways:
- Placing Triangles: You get a small number of points for every single small triangle that makes up the shape you just placed. (e.g., placing a 3-triangle shape might give you 3 * tiny_score points).
- Clearing Lines: This is where the BIG points come from! You get a much larger number of points for every single small triangle that disappears when you clear a line (or multiple lines at once!). See the next section for details!
6. Line Clearing Magic! โจ (The Key to High Scores!)
This is the most exciting part! When you place a shape, the game immediately checks if you've completed any lines. This section explains how the game finds and clears these lines.
-
What Lines Can Be Cleared? There are three types of lines the game looks for:
- Horizontal Lines โ๏ธ: A straight, unbroken line of filled triangles going across a single row.
- Diagonal Lines (Top-Left to Bottom-Right) โ๏ธ: An unbroken diagonal line of filled triangles stepping down and to the right.
- Diagonal Lines (Bottom-Left to Top-Right) โ๏ธ: An unbroken diagonal line of filled triangles stepping up and to the right.
-
How Lines are Found: Pre-calculation of Maximal Lines
- The Idea: Instead of checking every possible line combination all the time, the game pre-calculates all maximal continuous lines of playable triangles when it starts. A maximal line is the longest possible straight segment of playable triangles (not in a Death Zone) in one of the three directions (Horizontal, Diagonal โ๏ธ, Diagonal โ๏ธ).
- Tracing: For every playable triangle on the grid, the game traces outwards in each of the three directions to find the full extent of the continuous playable line passing through that triangle in that direction.
- Storing Maximal Lines: Only the complete maximal lines found are stored. For example, if tracing finds a playable sequence
A-B-C-D, only the line(A,B,C,D)is stored, not the sub-segments like(A,B,C)or(B,C,D). These maximal lines represent the potential lines that can be cleared. - Coordinate Map: The game also builds a map linking each playable triangle coordinate
(r, c)to the set of maximal lines it belongs to. This allows for quick lookup.
-
Defining the Paths (Neighbor Logic): How does the game know which triangle is "next" when tracing? It depends on the current triangle's orientation (๐บ or ๐ป) and the direction being traced:
- Horizontal โ๏ธ:
- Left Neighbor:
(r, c-1)(Always in the same row) - Right Neighbor:
(r, c+1)(Always in the same row)
- Left Neighbor:
- Diagonal โ๏ธ (TL-BR):
- If current is ๐บ (Up): Next is
(r+1, c)(Down triangle directly below) - If current is ๐ป (Down): Next is
(r, c+1)(Up triangle to the right)
- If current is ๐บ (Up): Next is
- Diagonal โ๏ธ (BL-TR):
- If current is ๐ป (Down): Next is
(r-1, c)(Up triangle directly above) - If current is ๐บ (Up): Next is
(r, c+1)(Down triangle to the right)
- If current is ๐ป (Down): Next is
- Horizontal โ๏ธ:
-
Visualizing the Paths:
- Horizontal โ๏ธ:
... [๐ป][๐บ][๐ป][๐บ][๐ป][๐บ] ... (Moves left/right in the same row) - Diagonal โ๏ธ (TL-BR): (Connects via shared horizontal edges)
...[๐บ]... ...[๐ป][๐บ] ... ... [๐ป][๐บ] ... ... [๐ป] ... (Path alternates row/col increments depending on orientation) - Diagonal โ๏ธ (BL-TR): (Connects via shared horizontal edges)
... [๐บ] ... ... [๐บ][๐ป] ... ... [๐บ][๐ป] ... ... [๐ป] ... (Path alternates row/col increments depending on orientation)
- Horizontal โ๏ธ:
-
The "Full Line" Rule: After you place a piece, the game looks at the coordinates
(r, c)of the triangles you just placed. Using the pre-calculated map, it finds all the maximal lines that contain any of those coordinates. For each of those maximal lines (that have at least 2 triangles), it checks: "Is every single triangle coordinate in this maximal line now occupied?" If yes, that line is complete! (Note: Single isolated triangles don't count as clearable lines). -
The Poof! ๐จ:
- If placing your shape completes one or MORE maximal lines (of any type, length >= 2) simultaneously, all the triangles in ALL completed lines vanish instantly!
- The spaces become empty again.
- You score points for every single triangle that vanished. Clearing multiple lines at once is the best way to rack up points! ๐ฅณ
7. Getting New Shapes: The Refill ๐ช
- The Trigger: The game only gives you new shapes when a specific condition is met.
- The Condition: New shapes appear only when all three of your preview slots become empty at the exact same time.
- How it Happens: This usually occurs right after you place your last available shape (the third one).
- The Refill: As soon as the third slot becomes empty, BAM! ๐ช Three brand new, randomly generated shapes instantly appear in the preview slots.
- Important: If you place a shape and only one or two slots are empty, you do not get new shapes yet. You must use up all three before the refill happens.
8. The End of the Road: Game Over ๐ญ
So, how does the game end?
- The Condition: The game is over when you cannot legally place any of the three shapes currently available in your preview slots anywhere on the grid.
- The Check: After every move (placing a shape and any resulting line clears), and after any potential shape refill, the game checks: "Is there at least one valid spot on the grid for Shape 1? OR for Shape 2? OR for Shape 3?"
- No More Moves: If the answer is "NO" for all three shapes (meaning none of them can be placed anywhere according to the Placement Rules), then the game immediately ends.
- Strategy: This means you need to be careful! Don't fill up the grid in a way that leaves no room for the types of shapes you might get later. Always try to keep options open! ๐ค
Core Technologies
- Python 3.10+
- trianglengin: Core game engine (state, actions, rules).
- PyTorch: For the deep learning model (CNNs, optional Transformers, Distributional Value Head) and training, with CUDA/MPS support.
- NumPy: For numerical operations, especially state representation (used by
trianglenginand features). - Ray: For parallelizing self-play data generation and statistics collection across multiple CPU cores/processes.
- Numba: (Optional, used in
features.grid_features) For performance optimization of specific grid calculations. - Cloudpickle: For serializing the experience replay buffer and training checkpoints.
- MLflow: For logging parameters, metrics, and artifacts (checkpoints, buffers) during training runs. Provides the primary web UI dashboard for experiment management.
- TensorBoard: For visualizing metrics during training (e.g., detailed loss curves). Provides a secondary web UI dashboard, often with faster graph updates.
- Pydantic: For configuration management and data validation.
- Typer: For the command-line interface.
- Pytest: For running unit tests.
Project Structure
.
โโโ .github/workflows/ # GitHub Actions CI/CD
โ โโโ ci_cd.yml
โโโ .alphatriangle_data/ # Root directory for ALL persistent data (GITIGNORED)
โ โโโ mlruns/ # MLflow internal tracking data & artifact store (for UI)
โ โโโ runs/ # Local artifacts per run (checkpoints, buffers, TB logs, configs)
โ โโโ <run_name>/
โ โโโ checkpoints/ # Saved model weights & optimizer states
โ โโโ buffers/ # Saved experience replay buffers
โ โโโ logs/ # Plain text log files for the run
โ โโโ tensorboard/ # TensorBoard log files (scalars, etc.)
โ โโโ configs.json # Copy of run configuration
โโโ alphatriangle/ # Source code for the AlphaZero agent package
โ โโโ __init__.py
โ โโโ cli.py # CLI logic (train command - headless only)
โ โโโ config/ # Pydantic configuration models (MCTS, Model, Train, Persistence)
โ โ โโโ README.md
โ โโโ data/ # Data saving/loading logic (DataManager, Schemas)
โ โ โโโ README.md
โ โโโ features/ # Feature extraction logic (operates on trianglengin.GameState)
โ โ โโโ README.md
โ โโโ mcts/ # Monte Carlo Tree Search (operates on trianglengin.GameState)
โ โ โโโ README.md
โ โโโ nn/ # Neural network definition and wrapper
โ โ โโโ README.md
โ โโโ rl/ # RL components (Trainer, Buffer, Worker)
โ โ โโโ README.md
โ โโโ stats/ # Statistics collection actor (StatsCollectorActor)
โ โ โโโ README.md
โ โโโ training/ # Training orchestration (Loop, Setup, Runner)
โ โ โโโ README.md
โ โโโ utils/ # Shared utilities and types (specific to AlphaTriangle)
โ โโโ README.md
โโโ tests/ # Unit tests (for alphatriangle components)
โ โโโ conftest.py
โ โโโ mcts/
โ โโโ nn/
โ โโโ rl/
โ โโโ stats/
โ โโโ training/
โโโ .gitignore
โโโ .python-version
โโโ LICENSE # License file (MIT)
โโโ MANIFEST.in # Specifies files for source distribution
โโโ pyproject.toml # Build system & package configuration (depends on trianglengin)
โโโ README.md # This file
โโโ requirements.txt # List of dependencies (includes trianglengin)
Key Modules (alphatriangle)
cli: Defines the command-line interface using Typer (onlytraincommand, headless). (alphatriangle/cli.py)config: Centralized Pydantic configuration classes (excludingEnvConfigandDisplayConfig). (alphatriangle/config/README.md)features: Contains logic to converttrianglengin.GameStateobjects into numerical features (StateType). (alphatriangle/features/README.md)nn: Contains the PyTorchnn.Moduledefinition (AlphaTriangleNet) and a wrapper class (NeuralNetwork). (alphatriangle/nn/README.md)mcts: Implements the Monte Carlo Tree Search algorithm (Node,run_mcts_simulations), operating ontrianglengin.GameState. (alphatriangle/mcts/README.md)rl: Contains RL components:Trainer(network updates),ExperienceBuffer(data storage, supports PER), andSelfPlayWorker(Ray actor for parallel self-play usingtrianglengin.GameState). (alphatriangle/rl/README.md)training: Orchestrates the headless training process usingTrainingLoop, managing workers, data flow, logging (to console, file, MLflow, TensorBoard), and checkpoints. Includesrunner.pyfor the callable training function. (alphatriangle/training/README.md)stats: Contains theStatsCollectorActor(Ray actor) for asynchronous statistics collection. (alphatriangle/stats/README.md)data: Manages saving and loading of training artifacts (DataManager) using Pydantic schemas andcloudpickle. (alphatriangle/data/README.md)utils: Provides common helper functions and shared type definitions specific to the AlphaZero implementation. (alphatriangle/utils/README.md)
Setup
- Clone the repository (for development):
git clone https://github.com/lguibr/alphatriangle.git cd alphatriangle
- Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the package (including
trianglengin):- For users:
# This will automatically install trianglengin from PyPI if available pip install alphatriangle # Or install directly from Git (installs trianglengin from PyPI) # pip install git+https://github.com/lguibr/alphatriangle.git
- For developers (editable install):
- First, ensure
trianglenginis installed (ideally in editable mode from its own directory if developing both):# From the trianglengin directory: # pip install -e .
- Then, install
alphatrianglein editable mode:# From the alphatriangle directory: pip install -e . # Install dev dependencies (optional, for running tests/linting) pip install -e .[dev] # Installs dev deps from pyproject.toml
- First, ensure
- For users:
- (Optional) Add data directory to
.gitignore: Create or edit the.gitignorefile in your project root and add the line:.alphatriangle_data/
Running the Code (CLI)
Use the alphatriangle command for training:
- Show Help:
alphatriangle --help - Run Training (Headless Only):
alphatriangle train [--seed 42] [--log-level INFO]
- Interactive Play/Debug (Use
trianglenginCLI): Note: Interactive modes are part of thetrianglenginlibrary, not thisalphatrianglepackage.# Ensure trianglengin is installed trianglengin play [--seed 42] [--log-level INFO] trianglengin debug [--seed 42] [--log-level DEBUG]
- Monitoring Training (Web Dashboards):
This project uses MLflow and TensorBoard to provide web-based dashboards for monitoring. It's recommended to run both concurrently for the best experience:
- MLflow UI (Experiment Overview & Artifacts):
Provides the main dashboard for comparing runs, viewing parameters, high-level metrics, and accessing saved artifacts (checkpoints, buffers). Updates occur as data is logged, but may require a browser refresh for the latest points.
# Run from the project root directory mlflow ui --backend-store-uri file:./.alphatriangle_data/mlruns
Access viahttp://localhost:5000. - TensorBoard (Near Real-Time Graphs):
Offers more frequently updated graphs of scalar metrics (losses, rates, etc.) during a run, making it ideal for closely monitoring training progress.
# Run from the project root directory, pointing to the *specific run's* TB logs tensorboard --logdir .alphatriangle_data/runs/<your_run_name>/tensorboard # Replace <your_run_name> with the actual name (e.g., train_20240101_120000) # You can also point to the parent 'runs' directory to see all runs: # tensorboard --logdir .alphatriangle_data/runs
Access viahttp://localhost:6006.
- MLflow UI (Experiment Overview & Artifacts):
Provides the main dashboard for comparing runs, viewing parameters, high-level metrics, and accessing saved artifacts (checkpoints, buffers). Updates occur as data is logged, but may require a browser refresh for the latest points.
- Running Unit Tests (Development):
pytest tests/
Configuration
All major parameters for the AlphaZero agent (MCTS, Model, Training, Persistence) are defined in the Pydantic classes within the alphatriangle/config/ directory. Modify these files to experiment with different settings. Environment configuration (EnvConfig) is defined within the trianglengin library.
Data Storage
All persistent data is stored within the .alphatriangle_data/ directory in the project root.
.alphatriangle_data/mlruns/: Managed by MLflow. Contains MLflow's internal tracking data (parameters, metrics) and its own copy of logged artifacts. This is the source for the MLflow UI..alphatriangle_data/runs/: Managed by DataManager. Contains locally saved artifacts for each run (checkpoints, buffers, TensorBoard logs, configs) before/during logging to MLflow. This directory is used for auto-resuming and direct access to TensorBoard logs during a run.
Maintainability
This project includes README files within each major alphatriangle submodule. Please keep these READMEs updated when making changes to the code's structure, interfaces, or core logic.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alphatriangle-1.0.0.tar.gz.
File metadata
- Download URL: alphatriangle-1.0.0.tar.gz
- Upload date:
- Size: 117.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6c287b5c8e16540fe7be85aa8f15662d205d84e89c34e7ccb810ba5838d4114
|
|
| MD5 |
d6e07995202404287acacbbdcdbbdf01
|
|
| BLAKE2b-256 |
cf720e91a565fa6627920fa761fc62872f5e5cd031aaf784b3fd053f04d7d097
|
Provenance
The following attestation bundles were made for alphatriangle-1.0.0.tar.gz:
Publisher:
ci_cd.yml on lguibr/alphatriangle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alphatriangle-1.0.0.tar.gz -
Subject digest:
b6c287b5c8e16540fe7be85aa8f15662d205d84e89c34e7ccb810ba5838d4114 - Sigstore transparency entry: 199962173
- Sigstore integration time:
-
Permalink:
lguibr/alphatriangle@99509e8fa4a8e03dcdad3cc8dd55affab48d54b3 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/lguibr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci_cd.yml@99509e8fa4a8e03dcdad3cc8dd55affab48d54b3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file alphatriangle-1.0.0-py3-none-any.whl.
File metadata
- Download URL: alphatriangle-1.0.0-py3-none-any.whl
- Upload date:
- Size: 141.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdbd275c97d9e9aecc07297fe2118da4efc2b22db0e5477d71712ef6c4903a46
|
|
| MD5 |
9050662e25aefed635b7e20845ad22fa
|
|
| BLAKE2b-256 |
021649409d6a8914fd05502bd6f3c16a13edb01f368a6d2e7bd69fca16188643
|
Provenance
The following attestation bundles were made for alphatriangle-1.0.0-py3-none-any.whl:
Publisher:
ci_cd.yml on lguibr/alphatriangle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alphatriangle-1.0.0-py3-none-any.whl -
Subject digest:
cdbd275c97d9e9aecc07297fe2118da4efc2b22db0e5477d71712ef6c4903a46 - Sigstore transparency entry: 199962174
- Sigstore integration time:
-
Permalink:
lguibr/alphatriangle@99509e8fa4a8e03dcdad3cc8dd55affab48d54b3 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/lguibr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci_cd.yml@99509e8fa4a8e03dcdad3cc8dd55affab48d54b3 -
Trigger Event:
push
-
Statement type: