Glinski hexagonal chess engine with AlphaZero-style training
Project description
Hexchess
Hexagonal chess engine (Glinski variant) in Rust with AlphaZero-style self-play training.
Structure
engine/— Rust engine: board representation (91-cell hex grid, axial coordinates), move generation, MCTS search, and neural network inferencetraining/— Async distributed AlphaZero loop: self-play workers, continuous trainer, Elo rating servicebindings/wasm/— WASM bindings for browser play (uses tract for inference)bindings/python/— PyO3 bindings for the training pipeline (uses ONNX Runtime)docs/— Documentation site with interactive playground (Fumadocs)
Quick Start
# Run engine tests
cargo test
# Build Python bindings (for training)
cd bindings/python && maturin develop && cd ../..
# Start training (run in separate terminals)
python -m training worker # self-play worker (run N of these)
python -m training trainer # continuous trainer (run 1)
python -m training elo-service # Elo rating service (run 1)
# Monitor
python -m training status # cluster status
python -m training progress # training progress table
# Run documentation site (includes interactive playground)
make docs-dev
Training Pipeline
The pipeline has three async components that communicate through shared storage (.data/):
- Workers generate self-play games using MCTS + the latest model, flush
.npzbatches to.data/training_data/ - Trainer samples uniformly from the most recent positions, trains, exports ONNX, and promotes a new model version every cycle
- Elo service runs continuous round-robin matches between model versions to track strength
On first run (no model exists), the trainer bootstraps by training on minimax-supervised imitation data before switching to self-play.
Configuration
All parameters live in training/config.py. Here's how they interact:
MCTS & Self-Play
| Parameter | Default | Description |
|---|---|---|
num_simulations |
500 | MCTS simulations per move. Higher = stronger play but slower data generation. Directly controls worker throughput (positions/hour). |
temperature_threshold |
60 | Move number after which temperature drops to temperature_low. Controls exploration vs exploitation in self-play games. |
temperature_high |
1.0 | Temperature for early-game moves (before threshold). Higher = more diverse openings in training data. |
temperature_low |
0.35 | Temperature for late-game moves (after threshold). Lc0-style floor ensures policy targets retain gradient signal. |
dirichlet_alpha |
0.3 | Dirichlet noise concentration at MCTS root. Encourages exploration of moves the net hasn't learned yet. |
dirichlet_epsilon |
0.25 | Mixing weight for Dirichlet noise (0 = no noise, 1 = all noise). |
worker_batch_size |
5 | Games per .npz flush. Smaller = fresher data available to trainer sooner, but more filesystem overhead. |
Training
| Parameter | Default | Description |
|---|---|---|
batch_size |
256 | Training batch size. Interacts with replay_buffer_size — larger batches relative to buffer = more repeated samples per epoch. |
learning_rate |
0.001 | Adam learning rate. |
l2_regularization |
1e-4 | Weight decay. Prevents overfitting, especially important when the replay buffer is small relative to training steps. |
replay_buffer_size |
5,000,000 | Max positions in the sliding window. The trainer selects the most recent .npz files up to this limit and samples uniformly. Larger = more data diversity but older positions stay longer. |
steps_per_cycle |
5,000 | Training steps before promoting a new model version. Each version = one cycle. Controls how often workers get an updated model. Shorter cycles = faster model turnover but less training per version. |
reload_interval |
1,000 | Re-scan .data/training_data/ every N steps within a cycle, picking up fresh worker output. Without this, the trainer would use a stale snapshot for the entire cycle. Should be < steps_per_cycle. |
min_positions_to_start |
1,000,000 | Bootstrap gate: self-play training won't start until this many positions exist. Prevents training on too little data early on. |
Bootstrap (Imitation)
These only apply to the initial bootstrap phase when no model exists yet:
| Parameter | Default | Description |
|---|---|---|
imitation_depth |
3 | Minimax search depth for generating imitation targets. Deeper = better targets but much slower generation. |
imitation_random_plies |
8 | Random opening moves per imitation game. Creates position diversity so the net doesn't just memorize one opening. |
imitation_temperature |
200.0 | Softmax temperature for converting centipawn scores to policy targets. Higher = softer policy (more weight on suboptimal moves). |
bootstrap_steps |
50,000 | Training steps for imitation bootstrap. Must be enough to beat the heuristic evaluator, then self-play takes over. |
Network Architecture
| Parameter | Default | Description |
|---|---|---|
num_residual_blocks |
6 | Depth of the residual tower. More blocks = more capacity but slower inference (affects worker throughput). |
num_filters |
128 | Width of convolutional layers. More filters = more capacity but slower inference. |
board_channels |
19 | Input tensor channels (piece planes + metadata). Must match serialization.rs. |
board_height / board_width |
11 | Hex grid embedded in 11x11 rectangle. Fixed by the coordinate system. |
Key Dynamics
Worker throughput vs training speed: Workers produce positions at a rate determined by num_simulations and hardware. The trainer consumes them at a rate determined by steps_per_cycle, batch_size, and GPU speed. If the trainer is much faster than workers, it overtrains on stale data. If workers are much faster, data goes untrained.
Buffer size vs cycle length: With replay_buffer_size = 5M and steps_per_cycle = 5000 at batch_size = 256, each cycle trains on ~1.28M samples — roughly 25% of the buffer. Positions near the edge of the window get fewer passes than recent ones simply because they'll age out sooner.
Reload interval: With reload_interval = 1000, fresh worker data enters the training distribution 5 times per cycle. This matters when workers are producing data fast — without reloads, the trainer would miss an entire cycle's worth of fresh data.
Model version turnover: Every steps_per_cycle steps, the trainer exports a new version. Workers poll for updates and switch. The lag between a new version appearing and workers using it depends on how often workers check (currently every worker_batch_size games).
Data Layout
.data/
models/ # best.onnx, best.pt, best.meta.json, v{N}.onnx snapshots
training_data/ # version-tagged self-play .npz files (sp_v5_*.npz)
logs/ # trainer.jsonl, worker-*.jsonl
elo_state.json # Elo service state (pair results, ratings)
elo_rankings.jsonl # Elo ranking history
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hexchess_zero-0.0.1.tar.gz.
File metadata
- Download URL: hexchess_zero-0.0.1.tar.gz
- Upload date:
- Size: 92.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
437f0a6b8e3056dc459161383a1250054ac97aa3c20503afdac714d8ac1a9943
|
|
| MD5 |
614e8b65bd15c998d8edcad878fd3479
|
|
| BLAKE2b-256 |
43922138e48672368f3a5684099e93b5dd79ad41649787fd1a63e014fc0ba563
|
Provenance
The following attestation bundles were made for hexchess_zero-0.0.1.tar.gz:
Publisher:
release.yml on k15z/hexchess-zero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hexchess_zero-0.0.1.tar.gz -
Subject digest:
437f0a6b8e3056dc459161383a1250054ac97aa3c20503afdac714d8ac1a9943 - Sigstore transparency entry: 1241967364
- Sigstore integration time:
-
Permalink:
k15z/hexchess-zero@3bd7f8bf04d7c95d2b5527be323b0e7f98cb0bda -
Branch / Tag:
refs/heads/main - Owner: https://github.com/k15z
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3bd7f8bf04d7c95d2b5527be323b0e7f98cb0bda -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 8.4 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a75f4b17fb77a47c4f41a7d788e684c19d811ad36244578b4a0e30e1b05a846b
|
|
| MD5 |
83895d5285cc4dcf9a96f7dd0cc2f9a2
|
|
| BLAKE2b-256 |
33bd421c916f50eaad635e1bb4831742eef0c828f9b3d6776f5b250b684c20ba
|
Provenance
The following attestation bundles were made for hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on k15z/hexchess-zero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
a75f4b17fb77a47c4f41a7d788e684c19d811ad36244578b4a0e30e1b05a846b - Sigstore transparency entry: 1241967571
- Sigstore integration time:
-
Permalink:
k15z/hexchess-zero@3bd7f8bf04d7c95d2b5527be323b0e7f98cb0bda -
Branch / Tag:
refs/heads/main - Owner: https://github.com/k15z
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3bd7f8bf04d7c95d2b5527be323b0e7f98cb0bda -
Trigger Event:
workflow_dispatch
-
Statement type: