A multi-game LLM benchmark for compact deterministic board games.

These details have not been verified by PyPI

Project links

Project description

BoardGameBench

BoardGameBench is a GomokuBench-style benchmark for testing LLM move quality across a curriculum of compact deterministic board games. It is an extension of GomokuBench, generalizing the same search-vs-LLM idea from one game to a multi-game score.

Instead of scoring a model on one game, BoardGameBench runs the same model through several games and reports a normalized aggregate score:

fast win = up to 1 point
slower win = at least 0.75 points
draw = 0.5 points
loss = up to 0.35 points based on how long the LLM survives
illegal-move forfeit = 0 points

Each game has its own move horizon, so survival and speed are scored relative to that game's expected length. This means an LLM still earns credit for making a losing game last longer, while a winning LLM is rewarded for closing the game quickly.

The current default curriculum follows the first strong multi-game set:

Connect Four
Gomoku 19x19
Breakthrough 6x6
Dots and Boxes 3x3
Othello 6x6
Othello 8x8
Hex 7x7

Each game has exact legal move generation, terminal-state detection, deterministic state updates, and a built-in alpha-beta style search opponent with game-specific evaluation. The engine is intentionally simple and auditable, so every result can be replayed from the saved JSON.

Benchmark Results

Current 10-round-per-game Ollama results:

Model	Games	Wins	Losses	Forfeits	Raw Score	Normalized	BRI
Nemotron 3 33B (Ollama)	70	2	68	38	4.91/70	7.02	118.0
Nemotron 3 Super (Ollama)	70	0	70	36	3.02/70	4.31	97.3
Gemma 4 8B (Ollama)	70	0	70	60	0.71/70	1.01	71.2

Quick Start

From this folder:

pip install .
boardgamebench list-games
boardgamebench play --game connect_four
boardgamebench benchmark --model-file ./models/example-openai-compatible.json -r 2
boardgamebench report

After installation:

pip install .
boardgamebench benchmark --model my-model -r 4

Model Configs

Model configs use the same OpenAI-compatible shape as GomokuBench:

{
  "provider": {
    "openrouter": {
      "name": "OpenRouter",
      "options": {
        "baseURL": "https://openrouter.ai/api/v1",
        "apiKeyEnv": "OPENROUTER_API_KEY"
      },
      "models": {
        "my-model": {
          "name": "My Model",
          "model": "provider/model-id",
          "rate_limit_rpm": 30,
          "timeout_seconds": 120
        }
      }
    }
  }
}

Put configs in models/<name>.json and run:

boardgamebench benchmark --model <name>

or pass a file directly:

boardgamebench benchmark --model-file /path/to/model.json

Choosing Games

Run the default curriculum:

boardgamebench benchmark --model my-model

By default, this runs 10 rounds per game. Use -r or --rounds to choose a different number:

boardgamebench benchmark --model my-model -r 20

Run a subset:

boardgamebench benchmark --model my-model --games connect_four,breakthrough_6x6,othello_6x6 -r 4

Available game ids:

connect_four
gomoku_19x19
breakthrough_6x6
dots_and_boxes_3x3
othello_6x6
othello_8x8
hex_7x7

See GAMES.md for the implemented curriculum and the planned stronger-oracle roadmap for the larger game list.

Outputs

Reports are saved in benchmarks/<model>.json and include:

model and provider metadata
aggregate score and per-game scores
per-round speed/survival scoring details
every move by the LLM and engine
raw LLM responses
final board states
a reasoning/API log path under /tmp/boardgamebench

To print a leaderboard table from saved benchmark files:

boardgamebench report

Notes

This repo is a benchmark harness, not a claim that the bundled engines are perfect solvers for every game. The design keeps the oracle interface pluggable so stronger sources can be dropped in later, such as Pascal Pons for Connect Four, Edax for Othello, MoHex/Benzene for Hex, or retrograde/proof databases for solved small variants.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

May 5, 2026

0.1.1

May 5, 2026

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boardgamebench-0.1.2.tar.gz (22.3 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

boardgamebench-0.1.2-py3-none-any.whl (27.6 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file boardgamebench-0.1.2.tar.gz.

File metadata

Download URL: boardgamebench-0.1.2.tar.gz
Upload date: May 5, 2026
Size: 22.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for boardgamebench-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`d3fa3c66c8635641066dd125cb3a7ac48fbb448bb6e2e59f3b0e2961ea1d5353`
MD5	`fdf5b5f8564edc90a6cd080faeb34e2f`
BLAKE2b-256	`5830abfffd86da34d672b481da36b6507158c56ab9d1ab0c09684a232c619bd4`

See more details on using hashes here.

File details

Details for the file boardgamebench-0.1.2-py3-none-any.whl.

File metadata

Download URL: boardgamebench-0.1.2-py3-none-any.whl
Upload date: May 5, 2026
Size: 27.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for boardgamebench-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eed48f821d71edaa1e7cf2831c0fff65716fd4a154309300d4c88b91e785e81`
MD5	`2578ac2e016cc51a376481ea20a3e562`
BLAKE2b-256	`70648c27655f3770127ededb6de8d282a58f2dbd71ffa5f1771ce6c2bf20ade9`

See more details on using hashes here.

boardgamebench 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BoardGameBench

Benchmark Results

Quick Start

Model Configs

Choosing Games

Outputs

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes