Skip to main content

Glinski hexagonal chess engine with AlphaZero-style training

Project description

Hexchess

Hexagonal chess engine (Glinski variant) in Rust with AlphaZero-style self-play training.

Structure

  • engine/ — Rust engine: board representation (91-cell hex grid, axial coordinates), move generation, MCTS search, and neural network inference
  • training/ — Async distributed AlphaZero loop: self-play workers, continuous trainer, Elo rating service
  • bindings/wasm/ — WASM bindings for browser play (uses tract for inference)
  • bindings/python/ — PyO3 bindings for the training pipeline (uses ONNX Runtime)
  • docs/ — Documentation site with interactive playground (Fumadocs)

Quick Start

# Run engine tests
cargo test

# Build Python bindings (for training)
cd bindings/python && maturin develop && cd ../..

# Start training (run in separate terminals)
python -m training worker      # self-play worker (run N of these)
python -m training trainer     # continuous trainer (run 1)
python -m training elo-service # Elo rating service (run 1)

# Monitor
python -m training status      # cluster status
python -m training progress    # training progress table

# Run documentation site (includes interactive playground)
make docs-dev

Training Pipeline

The pipeline has three async components that communicate through shared storage (.data/):

  1. Workers generate self-play games using MCTS + the latest model, flush .npz batches to .data/training_data/
  2. Trainer samples uniformly from the most recent positions, trains, exports ONNX, and promotes a new model version every cycle
  3. Elo service runs continuous round-robin matches between model versions to track strength

On first run (no model exists), the trainer bootstraps by training on minimax-supervised imitation data before switching to self-play.

Configuration

All parameters live in training/config.py. Here's how they interact:

MCTS & Self-Play

Parameter Default Description
num_simulations 500 MCTS simulations per move. Higher = stronger play but slower data generation. Directly controls worker throughput (positions/hour).
temperature_threshold 60 Move number after which temperature drops to temperature_low. Controls exploration vs exploitation in self-play games.
temperature_high 1.0 Temperature for early-game moves (before threshold). Higher = more diverse openings in training data.
temperature_low 0.35 Temperature for late-game moves (after threshold). Lc0-style floor ensures policy targets retain gradient signal.
dirichlet_alpha 0.3 Dirichlet noise concentration at MCTS root. Encourages exploration of moves the net hasn't learned yet.
dirichlet_epsilon 0.25 Mixing weight for Dirichlet noise (0 = no noise, 1 = all noise).
worker_batch_size 5 Games per .npz flush. Smaller = fresher data available to trainer sooner, but more filesystem overhead.

Training

Parameter Default Description
batch_size 256 Training batch size. Interacts with replay_buffer_size — larger batches relative to buffer = more repeated samples per epoch.
learning_rate 0.001 Adam learning rate.
l2_regularization 1e-4 Weight decay. Prevents overfitting, especially important when the replay buffer is small relative to training steps.
replay_buffer_size 5,000,000 Max positions in the sliding window. The trainer selects the most recent .npz files up to this limit and samples uniformly. Larger = more data diversity but older positions stay longer.
steps_per_cycle 5,000 Training steps before promoting a new model version. Each version = one cycle. Controls how often workers get an updated model. Shorter cycles = faster model turnover but less training per version.
reload_interval 1,000 Re-scan .data/training_data/ every N steps within a cycle, picking up fresh worker output. Without this, the trainer would use a stale snapshot for the entire cycle. Should be < steps_per_cycle.
min_positions_to_start 1,000,000 Bootstrap gate: self-play training won't start until this many positions exist. Prevents training on too little data early on.

Bootstrap (Imitation)

These only apply to the initial bootstrap phase when no model exists yet:

Parameter Default Description
imitation_depth 3 Minimax search depth for generating imitation targets. Deeper = better targets but much slower generation.
imitation_random_plies 8 Random opening moves per imitation game. Creates position diversity so the net doesn't just memorize one opening.
imitation_temperature 200.0 Softmax temperature for converting centipawn scores to policy targets. Higher = softer policy (more weight on suboptimal moves).
bootstrap_steps 50,000 Training steps for imitation bootstrap. Must be enough to beat the heuristic evaluator, then self-play takes over.

Network Architecture

Parameter Default Description
num_residual_blocks 6 Depth of the residual tower. More blocks = more capacity but slower inference (affects worker throughput).
num_filters 128 Width of convolutional layers. More filters = more capacity but slower inference.
board_channels 19 Input tensor channels (piece planes + metadata). Must match serialization.rs.
board_height / board_width 11 Hex grid embedded in 11x11 rectangle. Fixed by the coordinate system.

Key Dynamics

Worker throughput vs training speed: Workers produce positions at a rate determined by num_simulations and hardware. The trainer consumes them at a rate determined by steps_per_cycle, batch_size, and GPU speed. If the trainer is much faster than workers, it overtrains on stale data. If workers are much faster, data goes untrained.

Buffer size vs cycle length: With replay_buffer_size = 5M and steps_per_cycle = 5000 at batch_size = 256, each cycle trains on ~1.28M samples — roughly 25% of the buffer. Positions near the edge of the window get fewer passes than recent ones simply because they'll age out sooner.

Reload interval: With reload_interval = 1000, fresh worker data enters the training distribution 5 times per cycle. This matters when workers are producing data fast — without reloads, the trainer would miss an entire cycle's worth of fresh data.

Model version turnover: Every steps_per_cycle steps, the trainer exports a new version. Workers poll for updates and switch. The lag between a new version appearing and workers using it depends on how often workers check (currently every worker_batch_size games).

Data Layout

.data/
  models/           # best.onnx, best.pt, best.meta.json, v{N}.onnx snapshots
  training_data/    # version-tagged self-play .npz files (sp_v5_*.npz)
  logs/             # trainer.jsonl, worker-*.jsonl
  elo_state.json    # Elo service state (pair results, ratings)
  elo_rankings.jsonl # Elo ranking history

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexchess_zero-0.0.1.tar.gz (92.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

File details

Details for the file hexchess_zero-0.0.1.tar.gz.

File metadata

  • Download URL: hexchess_zero-0.0.1.tar.gz
  • Upload date:
  • Size: 92.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hexchess_zero-0.0.1.tar.gz
Algorithm Hash digest
SHA256 437f0a6b8e3056dc459161383a1250054ac97aa3c20503afdac714d8ac1a9943
MD5 614e8b65bd15c998d8edcad878fd3479
BLAKE2b-256 43922138e48672368f3a5684099e93b5dd79ad41649787fd1a63e014fc0ba563

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.1.tar.gz:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a75f4b17fb77a47c4f41a7d788e684c19d811ad36244578b4a0e30e1b05a846b
MD5 83895d5285cc4dcf9a96f7dd0cc2f9a2
BLAKE2b-256 33bd421c916f50eaad635e1bb4831742eef0c828f9b3d6776f5b250b684c20ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.1-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page