Skip to main content

A multi-game LLM benchmark for compact deterministic board games.

Project description

BoardGameBench

BoardGameBench is a GomokuBench-style benchmark for testing LLM move quality across a curriculum of compact deterministic board games. It is an extension of GomokuBench, generalizing the same search-vs-LLM idea from one game to a multi-game score.

Instead of scoring a model on one game, BoardGameBench runs the same model through several games and reports a normalized aggregate score:

  • win = 1 point
  • draw = 0.5 points
  • loss or illegal-move forfeit = 0 points

The current default curriculum follows the first strong multi-game set:

  1. Connect Four
  2. Gomoku 19x19
  3. Breakthrough 6x6
  4. Dots and Boxes 3x3
  5. Othello 6x6
  6. Othello 8x8
  7. Hex 7x7

Each game has exact legal move generation, terminal-state detection, deterministic state updates, and a built-in alpha-beta style search opponent with game-specific evaluation. The engine is intentionally simple and auditable, so every result can be replayed from the saved JSON.

Quick Start

From this folder:

pip install .
boardgamebench list-games
boardgamebench play --game connect_four
boardgamebench benchmark --model-file ./models/example-openai-compatible.json -r 2

After installation:

pip install .
boardgamebench benchmark --model my-model -r 4

Model Configs

Model configs use the same OpenAI-compatible shape as GomokuBench:

{
  "provider": {
    "openrouter": {
      "name": "OpenRouter",
      "options": {
        "baseURL": "https://openrouter.ai/api/v1",
        "apiKeyEnv": "OPENROUTER_API_KEY"
      },
      "models": {
        "my-model": {
          "name": "My Model",
          "model": "provider/model-id",
          "rate_limit_rpm": 30,
          "timeout_seconds": 120
        }
      }
    }
  }
}

Put configs in models/<name>.json and run:

boardgamebench benchmark --model <name>

or pass a file directly:

boardgamebench benchmark --model-file /path/to/model.json

Choosing Games

Run the default curriculum:

boardgamebench benchmark --model my-model

Run a subset:

boardgamebench benchmark --model my-model --games connect_four,breakthrough_6x6,othello_6x6 -r 4

Available game ids:

  • connect_four
  • gomoku_19x19
  • breakthrough_6x6
  • dots_and_boxes_3x3
  • othello_6x6
  • othello_8x8
  • hex_7x7

See GAMES.md for the implemented curriculum and the planned stronger-oracle roadmap for the larger game list.

Outputs

Reports are saved in benchmarks/<model>.json and include:

  • model and provider metadata
  • aggregate score and per-game scores
  • every move by the LLM and engine
  • raw LLM responses
  • final board states
  • a reasoning/API log path under /tmp/boardgamebench

Notes

This repo is a benchmark harness, not a claim that the bundled engines are perfect solvers for every game. The design keeps the oracle interface pluggable so stronger sources can be dropped in later, such as Pascal Pons for Connect Four, Edax for Othello, MoHex/Benzene for Hex, or retrograde/proof databases for solved small variants.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boardgamebench-0.1.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boardgamebench-0.1.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file boardgamebench-0.1.0.tar.gz.

File metadata

  • Download URL: boardgamebench-0.1.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for boardgamebench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5338ae6f57f6b58d082d8270938476afd0d8469841d72c4f6b406669350d9b3a
MD5 c9533fddbaa5a1f94890c528acc4efd8
BLAKE2b-256 5bfe494379fbf1eb7ea1508b3ddf4c1c1ab23acca58d350c83dd2d8de73ddb63

See more details on using hashes here.

File details

Details for the file boardgamebench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: boardgamebench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for boardgamebench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db3e7198aab3602d91b55da36757d5cbd59cd5b9cbc5b095d960cc86639e1854
MD5 846921d8961593bd34abaad1ccb9540d
BLAKE2b-256 8c382f5d561aa3aad807fbeb08335e34f2811bbcdf9c9203ec0183c00b15a820

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page