Wiki Game LLM agent built on AISI Inspect.
Project description
wikigame-agent
An LLM agent that plays the Wikipedia game/Wikiracing (navigate from a starting page to a goal page using only links), built on AISI Inspect.
This started as a port of the Chapter 3.4 LLM Agents exercise from ARENA 3.0 into a self-contained project. The notable changes over the original notebook:
- A custom MediaWiki client with a real User-Agent, exponential backoff retries, and a clear error when the API returns non-JSON. This eliminates the
JSONDecodeErrors that came from Wikipedia silently rate-limiting thewikipediaPyPI package. - A single agent loop with an opt-in
--notesmode for carrying reasoning forward across moves. tools.pywithget_content,move_page, andcheck_path(the last one was unimplemented in the notebook).- A Rich-based per-turn console display so you can watch the game without spinning up the Inspect log viewer.
Install
Two paths, depending on whether you just want to run the agent or also hack on it.
From PyPI (just want to run it)
Pick one of:
uv tool install wikigame-agent # isolated, puts `wikigame` on your PATH
# or
pipx install wikigame-agent # same idea, if you prefer pipx
# or
pip install wikigame-agent # into an existing venv
Then provide an OPENAI_API_KEY. Either export it in your shell:
export OPENAI_API_KEY=sk-...
…or drop a .env file in whatever directory you'll run wikigame from — the CLI auto-loads it. Minimal .env:
OPENAI_API_KEY=sk-...
# Optional — defaults are fine, override if you want:
# INSPECT_EVAL_MODEL=openai/gpt-5.4-nano
# WIKIGAME_USER_AGENT=my-tool (https://example.com/contact)
Run it:
wikigame play "Canada" "Monty Python"
Note:
wikigame viewshells out to theinspectcommand frominspect-ai. Withuv tool installorpipx, that command isn't on your PATH (onlywikigameis). For viewing logs from a tool-style install, either runuvx --from inspect-ai inspect view --log-dir logsdirectly, or do apip installinto a venv so both commands are available.
From source (contributing / hacking)
git clone https://github.com/yarv/wikigame-agent
cd wikigame-agent
uv sync # create venv, install dev deps
cp .env.example .env # then fill in OPENAI_API_KEY
uv run wikigame play "Canada" "Monty Python"
Play a game
uv run wikigame play "Canada" "Monty Python" \
--model openai/gpt-5.4-nano --reasoning-effort medium
(Drop the uv run prefix if you installed from PyPI.)
Options:
--notes— carry a compact textual record of each prior move's reasoning forward across page transitions. Default off; useful on long-form races where the model otherwise re-explores ideas it has already considered.--model openai/gpt-5.4-nano— overridesINSPECT_EVAL_MODEL--reasoning-effort {none|minimal|low|medium|high|xhigh|max}— for o-series and gpt-5 models. The agent relies on the model reasoning before each move; on a reasoning model that means setting this to at leastlow. On the OpenAI gpt-5 family the default isminimal, which produces no useful reasoning and the agent will flounder.--proxy-reasoning— for models without native reasoning (e.g.gpt-4o-mini) or with reasoning effort set tominimal. Splits each move turn into a separate text-only reason call (forcedtool_choice="none") followed by an act call, so the model's CoT shows up in plain text. Roughly doubles per-move model calls, so prefer a reasoning model when possible.--turn-limit 40— max number of moves the agent may make before the run aborts with reasonturn_limit, counted at the game layer. The agent also auto-detects tight cycles (A↔B oscillation, A→B→C→A): on the first detection it gets a one-shot nudge, on the second it stops with reasoncycle.--message-limit 240— hard backstop on Inspect message count; default is set high enough that--turn-limitfires first.--enable-check-path— adds thecheck_pathdry-run tool-v— debug logging
Each move prints a panel like:
╭─ Move 1: Canada -> British Empire ─╮
│ Path: Canada -> British Empire │
╰───────────────────────────────────────╯
…and a final summary panel showing the full path and whether the goal was reached.
View Inspect logs
The CLI writes Inspect logs to ./logs/. To inspect them in the browser:
uv run wikigame view # opens http://localhost:7575
# or equivalently:
uv run inspect view --log-dir logs
Development
make install # uv sync --all-extras + installs pre-commit hooks
make check # ruff lint + format check + pytest (everything CI runs)
make help # list all targets
Tests use respx to mock the MediaWiki API — no network required.
See CONTRIBUTING.md for the full contributor workflow, including the Conventional Commits PR-title convention used to drive automatic version bumps and changelog updates via release-please.
Design notes
The agent only sees the current page. No goal-page summary, no link list — just the title of where it is, the title of where it's going, and (via get_content) the body of the page it's currently on. This mirrors how a human plays and makes results comparable across runs and models.
Self-contained MediaWiki client. The popular wikipedia PyPI package is unmaintained and crashes with JSONDecodeError when Wikipedia rate-limits it (it tries to parse the HTML error page as JSON). wiki_client.py sets a real User-Agent, retries transient failures, raises a clear error on non-JSON responses, and caches pages in-process.
One agent loop, two modes. The agent makes one model call per turn, alternating a forced get_content on each new page with a move_page call (reasoning text and the tool call come back in one response). On a successful move the message history is rebuilt from scratch. Use --proxy-reasoning to split the move turn into a separate reason + act pair for models without native reasoning. Use --notes to additionally carry a compact textual record of each prior move's reasoning across transitions, so the model can see why it picked each prior page rather than just where it ended up.
Layout
src/wikigame_agent/
wiki_client.py # async MediaWiki client (the JSONDecodeError fix lives here)
game.py # WikiGame, WikiGameRules
tools.py # get_content, move_page, check_path
prompts.py # system / on-page / next-step / step
agents.py # wiki_agent (the single agent loop)
display.py # Rich-based turn-by-turn console output
cli.py # `wikigame play ...`, `wikigame view`
config.py # pydantic-settings, reads .env
Credits
Original exercise from ARENA 3.0, Chapter 3.4 (LLM Agents) by Callum McDougall and contributors.
Contributing
See CONTRIBUTING.md. Issues and PRs welcome.
License
Apache License 2.0. Contributions are accepted under the same license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikigame_agent-0.7.1.tar.gz.
File metadata
- Download URL: wikigame_agent-0.7.1.tar.gz
- Upload date:
- Size: 212.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1052dc992777a9e69f6c5683a495446ac956588d82e3df172f73f9d562a85a83
|
|
| MD5 |
2ee80244bc100cdf44d48250317b2fed
|
|
| BLAKE2b-256 |
d0382081e6321b1a2d598ab199e2418578226501a9244e8f136f72f8ec4e8928
|
Provenance
The following attestation bundles were made for wikigame_agent-0.7.1.tar.gz:
Publisher:
publish.yml on yarv/wikigame-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wikigame_agent-0.7.1.tar.gz -
Subject digest:
1052dc992777a9e69f6c5683a495446ac956588d82e3df172f73f9d562a85a83 - Sigstore transparency entry: 1566741043
- Sigstore integration time:
-
Permalink:
yarv/wikigame-agent@aca2f9c54b2408498272891e85f8a4ac59d55f78 -
Branch / Tag:
refs/tags/wikigame-agent-v0.7.1 - Owner: https://github.com/yarv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aca2f9c54b2408498272891e85f8a4ac59d55f78 -
Trigger Event:
release
-
Statement type:
File details
Details for the file wikigame_agent-0.7.1-py3-none-any.whl.
File metadata
- Download URL: wikigame_agent-0.7.1-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79caee4ea8742343ed1934ae24ebc126c7b9afd8133cea3241d65efbc617a426
|
|
| MD5 |
02c9d3156777655005caddd0b6a663cc
|
|
| BLAKE2b-256 |
ff96d2d46cc812c93c62f815c54409552201c7ffcd8f544f0f15452ad50d4fe3
|
Provenance
The following attestation bundles were made for wikigame_agent-0.7.1-py3-none-any.whl:
Publisher:
publish.yml on yarv/wikigame-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wikigame_agent-0.7.1-py3-none-any.whl -
Subject digest:
79caee4ea8742343ed1934ae24ebc126c7b9afd8133cea3241d65efbc617a426 - Sigstore transparency entry: 1566741094
- Sigstore integration time:
-
Permalink:
yarv/wikigame-agent@aca2f9c54b2408498272891e85f8a4ac59d55f78 -
Branch / Tag:
refs/tags/wikigame-agent-v0.7.1 - Owner: https://github.com/yarv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aca2f9c54b2408498272891e85f8a4ac59d55f78 -
Trigger Event:
release
-
Statement type: