Skip to main content

Glinski hexagonal chess engine with AlphaZero-style training

Project description

Hexchess Zero

Hexagonal chess engine (Glinski variant) in Rust with AlphaZero-style self-play training.

Install

Prebuilt packages are published to PyPI and npm under the name hexchess-zero:

pip install hexchess-zero       # Python (imports as `hexchess`)
npm install hexchess-zero       # JavaScript / WASM

Python wheels are available for Linux, macOS, and Windows (3.9–3.13); the npm package bundles the WASM binary. See docs/content/docs/usage/ for API documentation, or the published docs site for interactive examples.

Structure

  • engine/ — Rust engine: board representation (91-cell hex grid, axial coordinates), move generation, MCTS search, minimax search, and neural network inference
  • training/ — Async distributed AlphaZero loop: self-play workers, continuous trainer, Elo rating service, dashboard
  • bindings/wasm/ — WASM bindings for browser play (uses tract for inference)
  • bindings/python/ — PyO3 bindings for the training pipeline (uses ONNX Runtime)
  • docs/ — Documentation site with interactive playground (Fumadocs)
  • k8s/ — Kubernetes manifests for the production training cluster

Quick Start

# Run engine tests
make test

# One-time setup: uv sync + build Python bindings
make setup

# Start training (run in separate terminals — or `make docker-up`)
make worker      # self-play worker (run N of these)
make trainer     # continuous trainer (run 1)
make elo         # Elo rating service (scale via replicas)
make dashboard   # status dashboard

# Quick CLI status
make status

# Run documentation site (includes interactive playground)
make docs-dev

Training Pipeline

The pipeline has three asynchronous components that coordinate purely through S3 (DigitalOcean Spaces / Cloudflare R2 / any S3-compatible store). Workers can run anywhere with credentials.

  1. Workers generate self-play games using MCTS + the latest model and flush .npz batches to S3.
  2. Trainer maintains a sliding-window replay buffer over recent self-play data, trains a multi-head network, exports ONNX, and promotes a new model version once enough fresh data has accumulated.
  3. Elo service plays continuous matches between model versions and baselines, persisting per-game results to S3 and rating models with OpenSkill (Weng-Lin / Plackett-Luce).

On first run (no model exists), the trainer bootstraps by training on minimax-supervised imitation data before switching to self-play. The production model consumes a 22-channel side-to-move tensor and predicts five heads: main policy, terminal WDL value, moves-left, short-term value, and an auxiliary opponent-policy target.

Configuration

All parameters live in training/config.py. Key dials:

MCTS & Self-Play

Parameter Default Description
num_simulations 800 MCTS simulations per move. Higher = stronger play but slower data generation.
temperature_threshold 60 Move number after which temperature drops to temperature_low.
temperature_high 1.0 Temperature for early-game moves (before threshold). Higher = more diverse openings.
temperature_low 0.35 Late-game temperature. Lc0-style floor — anything near zero produced one-hot policy targets in 65–70% of positions, killing gradient signal.
dirichlet_alpha 0.3 Dirichlet noise concentration at the MCTS root.
dirichlet_epsilon 0.25 Mixing weight for Dirichlet noise.
worker_batch_size 2 Games per .npz flush.

Training

Parameter Default Description
batch_size 256 Training batch size.
learning_rate 5e-4 SGD learning rate after warmup.
l2_regularization 3e-5 Weight decay.
window_c / window_alpha / window_beta 25,000 / 0.75 / 0.4 KataGo-style sublinear replay-window parameters.
steps_per_cycle 3,000 Trainer steps per cycle before reloading metrics, buffers, and promotion state.
reload_interval 1,000 Re-scan S3 every N steps within a cycle to pick up fresh worker output.
max_train_steps_per_new_data 4.0 KataGo-style token bucket: target training passes per new data point. Throttles the trainer when workers fall behind.
min_positions_to_start 200,000 Bootstrap gate: self-play training won't start until this many positions exist.
promote_every_new_positions 300,000 Minimum fresh self-play positions required between promotions.
imitation_mix_start / imitation_mix_end / imitation_mix_decay_end_version 0.3 / 0.0 / 5 Early versions mix in minimax imitation data, then decay to pure self-play.

Bootstrap (Imitation)

These only apply to the initial bootstrap phase when no model exists yet:

Parameter Default Description
imitation_depth 3 Minimax search depth for generating imitation targets.
imitation_exploration_plies 30 Plies that use softmax sampling (rather than greedy) for opening diversity.
imitation_temperature 200.0 Softmax temperature for converting centipawn scores to policy targets.
imitation_wdl_lambda 0.5 Blend between sigmoid(eval) and final game outcome for the WDL value target.
bootstrap_steps 50,000 Training steps for imitation bootstrap.
bootstrap_learning_rate 0.003 Higher LR for the clean supervised signal (3× self-play LR).

Network Architecture

Parameter Default Description
num_residual_blocks 8 Depth of the residual tower.
num_filters 144 Width of convolutional layers.
se_channels 32 Squeeze-and-excitation bottleneck width.
global_pool_channels 32 KataGo-style global pooling width.
global_pool_blocks (2, 5) Which residual blocks get global pooling.
policy_channels / aux_policy_channels / value_channels 4 / 2 / 32 Conv channels in the main policy, auxiliary policy, and value heads.
board_channels 22 Input tensor channels (piece planes + metadata). Must match serialization.rs.
board_height / board_width 11 Hex grid embedded in 11×11 rectangle. Fixed by the coordinate system.

Key Dynamics

Worker throughput vs training speed. Workers produce positions at a rate determined by num_simulations and CPU count. The trainer consumes them at a rate determined by steps_per_cycle, batch_size, and GPU speed. The token bucket (max_train_steps_per_new_data) keeps them in lockstep — if the trainer outpaces data production, it sleeps instead of overfitting on stale data.

Replay window vs cycle length. The trainer no longer uses a fixed-size replay buffer. Instead it computes a KataGo-style sublinear window from the cumulative self-play count, so early training emphasizes recency while later training keeps more historical coverage without growing linearly forever.

Model version turnover. The trainer only promotes after both conditions hold: it has completed a training cycle, and at least promote_every_new_positions new self-play positions have landed since the last promotion. Early versions can also be gated against the incumbent before models/latest.meta.json is advanced.

S3 Layout

models/
  latest.onnx              # current model for inference
  latest.meta.json         # {"version": N, "timestamp": "...", "positions_at_promote": M}
  checkpoint.pt            # PyTorch training checkpoint
  versions/{N}.onnx        # immutable version snapshots

data/
  selfplay/v{N}/{ts}_{rand}_n{count}.npz   # self-play batches (position count in filename)
  selfplay_traces/v{N}/{game_id}.json      # per-game search trace sidecars
  imitation/{ts}_{rand}_n{count}.npz       # bootstrap minimax batches

state/
  elo.json                 # Elo projection (rebuilt from elo_games/)
  elo_games/{ts}_{rand}.json  # one immutable object per played game (race-free writes)
  trainer_metrics.json     # latest trainer telemetry for the dashboard

heartbeats/
  {hostname}.json          # worker liveness + stats for the dashboard

benchmarks/
  results/{version}.json   # benchmark snapshots consumed by the dashboard

Position counts are encoded in each .npz filename so the trainer can compute buffer size from an S3 LIST without opening any files.

S3 credentials live in .env (gitignored): BUCKET_NAME, ACCESS_KEY, SECRET_KEY, ENDPOINT.

Documentation

Full docs (engine internals, training details, bindings reference, deployment, interactive playground) live in docs/ and are published from the Fumadocs site. Run make docs-dev to view locally.

Agent Workflow

Repository-level agent instructions now live in AGENTS.md. Use codex review --uncommitted for local changes or codex review --base <branch> before merging a PR.

The training-health workflow now lives in the repo at ai/training-health.md so Codex can follow the same process directly from the workspace.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexchess_zero-0.0.3.tar.gz (126.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hexchess_zero-0.0.3-cp39-abi3-win_amd64.whl (6.9 MB view details)

Uploaded CPython 3.9+Windows x86-64

hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_aarch64.whl (9.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

hexchess_zero-0.0.3-cp39-abi3-macosx_11_0_arm64.whl (7.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file hexchess_zero-0.0.3.tar.gz.

File metadata

  • Download URL: hexchess_zero-0.0.3.tar.gz
  • Upload date:
  • Size: 126.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hexchess_zero-0.0.3.tar.gz
Algorithm Hash digest
SHA256 bad2929eb1631025fabf1a76a1eca27cafc23a3f02e5881b2c209c2dd5d46f8d
MD5 7963a5831136f94590e05ffa32a12049
BLAKE2b-256 87bab5717043d4821a75061116d0af350c28739da975c0b696b394ba82b6cb25

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.3.tar.gz:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hexchess_zero-0.0.3-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for hexchess_zero-0.0.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 93bd5b1ccaf7944fa7992dc1348e49d6648a6fe6a801d00888249683d0c6a795
MD5 ada0c9990da9f45f0f83e3905f96281a
BLAKE2b-256 1b3f77a666c155980848ee7c97831123498c6a8a703b5b89e8cb1f126a2bd10b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.3-cp39-abi3-win_amd64.whl:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1cac9634e16a6ad3a911322da9bd9df37c182b8ca9f575ccbfce60fb46eb4f84
MD5 39dcf6cc12872a27dece44bc31baa1f5
BLAKE2b-256 062e29a84d229e46142df29732fc8f97d88c36bc234d3325f733a21510b36067

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d1428877b604bdbe4f1e3869d2f93a984fcc301b1c0e0bc8d05d77f5269b26fd
MD5 89b08f577db41325bd71bc62235559d5
BLAKE2b-256 f72c9926ff2366dfd5e592609d328e28c667d52c5acf74a0794f8376e32b4a82

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.3-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hexchess_zero-0.0.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hexchess_zero-0.0.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a5f6011a62f1658204351e066140116a62b4343197c414c2cff8d1aad3668411
MD5 52af98ce3cb54d1ba9b442d4d99f26b5
BLAKE2b-256 8ebe05a1482fdcf42b97c9d49677d71768366358d1c8a788d67ba21d446ae185

See more details on using hashes here.

Provenance

The following attestation bundles were made for hexchess_zero-0.0.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on k15z/hexchess-zero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page