A chess LLM benchmark scored against a local UCI master engine.

These details have not been verified by PyPI

Project links

Project description

LLMCheesBench

LLMCheesBench is a BoardGameBench-style LLM benchmark for chess. It scores model move quality against a local UCI master engine, with Stockfish as the default recommendation for a personal PC.

The benchmark is designed around the practical question:

Which LLM can make strong chess moves when judged by a real local chess engine?

Instead of forcing every model through long full games, LLMCheesBench uses a compact position curriculum. Each model receives a FEN, board diagram, legal UCI moves, and legal SAN moves. The model must reply with one legal move. A local engine analyzes the same position, and LLMCheesBench scores the model by centipawn loss:

exact engine top move = full credit
small centipawn loss = high partial credit
large blunder = low or zero credit
illegal or unparsable move = forfeit for that position

This makes the benchmark fast, replayable, and suitable for comparing local or API models on one computer.

Why Stockfish

For a personal PC, Stockfish is the best default master algorithm to use as the oracle: it is free, popular, very strong, UCI-compatible, easy to install, and scales across CPU threads with Threads and Hash settings. LLMCheesBench also accepts other UCI engines, so you can compare against Berserk, Ethereal, Komodo, or Lc0 if you have them installed.

Quick Start

From this folder:

pip install .
brew install stockfish
llmcheesbench list-positions
llmcheesbench engine-best --threads 8 --hash 1024 --movetime 2000
llmcheesbench benchmark --model-file ./models/example-openai-compatible.json --threads 8 --hash 1024 --movetime 2000
llmcheesbench report

The old chessbench command is kept as a compatibility alias, but llmcheesbench is the primary command.

If Stockfish is not in your PATH, pass it directly:

llmcheesbench benchmark --model my-model --engine /path/to/stockfish

or set:

export LLMCHEESBENCH_ENGINE=/path/to/stockfish

Model Configs

Model configs use the same OpenAI-compatible shape as BoardGameBench and GomokuBench:

{
  "provider": {
    "openrouter": {
      "name": "OpenRouter",
      "options": {
        "baseURL": "https://openrouter.ai/api/v1",
        "apiKeyEnv": "OPENROUTER_API_KEY"
      },
      "models": {
        "my-model": {
          "name": "My Model",
          "model": "provider/model-id",
          "rate_limit_rpm": 30,
          "timeout_seconds": 120
        }
      }
    }
  }
}

Put configs in models/<name>.json and run:

llmcheesbench benchmark --model <name>

or pass a file directly:

llmcheesbench benchmark --model-file /path/to/model.json

Engine Settings

The most important knobs:

--threads: CPU threads for the engine.
--hash: engine hash table size in MB.
--movetime: milliseconds per position.
--depth: optional fixed depth instead of time.
--multipv: number of candidate engine lines saved and scored directly.

Example for a stronger desktop run:

llmcheesbench benchmark --model my-model --threads 16 --hash 4096 --movetime 5000 --multipv 8

Position Suite

Run the default curriculum:

llmcheesbench benchmark --model my-model

Run a subset:

llmcheesbench benchmark --model my-model --positions opening_center,tactic_pin,endgame_rook

Available position categories:

opening
tactic
middlegame
defense
endgame
mate

Use llmcheesbench list-positions to see the current IDs.

Outputs

Reports are saved in benchmarks/<model>.json and include:

model and provider metadata
engine path and CPU/hash/search settings
aggregate score and per-category scores
every model move and raw response
engine top lines with centipawn scores
centipawn loss per position
a reasoning/API log path under /tmp/llmcheesbench

To print a leaderboard table from saved benchmark files:

llmcheesbench report

To also create an interactive browser replay when running a benchmark:

llmcheesbench benchmark --model my-model --show-web

The command prints a local URL like http://localhost:8765/my-model.html and keeps the replay server running until you press Ctrl-C. To choose a port:

llmcheesbench benchmark --model my-model --show-web --web-port 8765

The web replay starts before the benchmark finishes. Completed positions appear as the run progresses, and unfinished positions stay marked as waiting. Use Compare Both to see the LLM move and the master-AI move highlighted together on the starting board.

To create a replay from an existing JSON result:

llmcheesbench show-web benchmarks/my-model.json

The HTML replay is saved next to the JSON file and can be opened directly in a browser.

Notes

LLMCheesBench is an LLM benchmark, not a replacement for engine-vs-engine testing. If your goal is simply to pick the strongest chess engine for a personal PC, start with Stockfish, then compare against other UCI engines at the same thread, hash, and time controls.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmcheesbench-0.1.0.tar.gz (25.5 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmcheesbench-0.1.0-py3-none-any.whl (27.0 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file llmcheesbench-0.1.0.tar.gz.

File metadata

Download URL: llmcheesbench-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for llmcheesbench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1a0337d133836ca7d0e88bb8980bc1397017950be955c3048ed1c43a6d7ae835`
MD5	`f819674b471a44d8843e8e9134270ab3`
BLAKE2b-256	`6c6a5d0806a780dc9998e5da729069f61e777029c9e197217667cc3cf9bd57ae`

See more details on using hashes here.

File details

Details for the file llmcheesbench-0.1.0-py3-none-any.whl.

File metadata

Download URL: llmcheesbench-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 27.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for llmcheesbench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c0f5f5ddbb29e578d08a09ff5beca40be88ea73d1eb77ce20cfdc32d2ed1b732`
MD5	`aaeb1e3fb25ca670265244b01cc104c3`
BLAKE2b-256	`5910e132fbe17a3e0d79aca114e7be04ad3d8a803b416cb78e930442be63791b`

See more details on using hashes here.

llmcheesbench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMCheesBench

Why Stockfish

Quick Start

Model Configs

Engine Settings

Position Suite

Outputs

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes