Skip to main content

Wiki Game LLM agent built on AISI Inspect.

Project description

wikigame-agent

CI License: Apache 2.0 Python 3.11+

An LLM agent that plays the Wikipedia game/Wikiracing (navigate from a starting page to a goal page using only links), built on AISI Inspect.

This started as a port of the Chapter 3.4 LLM Agents exercise from ARENA 3.0 into a self-contained project. The notable changes over the original notebook:

  • A custom MediaWiki client with a real User-Agent, exponential backoff retries, and a clear error when the API returns non-JSON. This eliminates the JSONDecodeErrors that came from Wikipedia silently rate-limiting the wikipedia PyPI package.
  • A single agent loop with an opt-in --notes mode for carrying reasoning forward across moves.
  • tools.py with get_content, move_page, and check_path (the last one was unimplemented in the notebook).
  • A Rich-based per-turn console display so you can watch the game without spinning up the Inspect log viewer.

Install

Two paths, depending on whether you just want to run the agent or also hack on it.

From PyPI (just want to run it)

Pick one of:

uv tool install wikigame-agent     # isolated, puts `wikigame` on your PATH
# or
pipx install wikigame-agent        # same idea, if you prefer pipx
# or
pip install wikigame-agent         # into an existing venv

Then provide an OPENAI_API_KEY. Either export it in your shell:

export OPENAI_API_KEY=sk-...

…or drop a .env file in whatever directory you'll run wikigame from — the CLI auto-loads it. Minimal .env:

OPENAI_API_KEY=sk-...
# Optional — defaults are fine, override if you want:
# INSPECT_EVAL_MODEL=openai/gpt-5.4-nano
# WIKIGAME_USER_AGENT=my-tool (https://example.com/contact)

Run it:

wikigame play "Canada" "Monty Python"

Note: wikigame view shells out to the inspect command from inspect-ai. With uv tool install or pipx, that command isn't on your PATH (only wikigame is). For viewing logs from a tool-style install, either run uvx --from inspect-ai inspect view --log-dir logs directly, or do a pip install into a venv so both commands are available.

From source (contributing / hacking)

git clone https://github.com/yarv/wikigame-agent
cd wikigame-agent
uv sync                       # create venv, install dev deps
cp .env.example .env          # then fill in OPENAI_API_KEY
uv run wikigame play "Canada" "Monty Python"

Play a game

uv run wikigame play "Canada" "Monty Python" \
  --model openai/gpt-5.4-nano --reasoning-effort medium

(Drop the uv run prefix if you installed from PyPI.)

Options:

  • --notes — carry a compact textual record of each prior move's reasoning forward across page transitions. Default off; useful on long-form races where the model otherwise re-explores ideas it has already considered.
  • --model openai/gpt-5.4-nano — overrides INSPECT_EVAL_MODEL
  • --reasoning-effort {none|minimal|low|medium|high|xhigh|max} — for o-series and gpt-5 models. The agent relies on the model reasoning before each move; on a reasoning model that means setting this to at least low. On the OpenAI gpt-5 family the default is minimal, which produces no useful reasoning and the agent will flounder.
  • --proxy-reasoning — for models without native reasoning (e.g. gpt-4o-mini) or with reasoning effort set to minimal. Splits each move turn into a separate text-only reason call (forced tool_choice="none") followed by an act call, so the model's CoT shows up in plain text. Roughly doubles per-move model calls, so prefer a reasoning model when possible.
  • --turn-limit 40 — max number of moves the agent may make before the run aborts with reason turn_limit, counted at the game layer. The agent also auto-detects tight cycles (A↔B oscillation, A→B→C→A): on the first detection it gets a one-shot nudge, on the second it stops with reason cycle.
  • --message-limit 240 — hard backstop on Inspect message count; default is set high enough that --turn-limit fires first.
  • --enable-check-path — adds the check_path dry-run tool
  • -v — debug logging

Each move prints a panel like:

╭─ Move 1: Canada  ->  British Empire ─╮
│ Path: Canada -> British Empire        │
╰───────────────────────────────────────╯

…and a final summary panel showing the full path and whether the goal was reached.

View Inspect logs

The CLI writes Inspect logs to ./logs/. To inspect them in the browser:

uv run wikigame view             # opens http://localhost:7575
# or equivalently:
uv run inspect view --log-dir logs

Development

make install        # uv sync --all-extras + installs pre-commit hooks
make check          # ruff lint + format check + pytest (everything CI runs)
make help           # list all targets

Tests use respx to mock the MediaWiki API — no network required.

See CONTRIBUTING.md for the full contributor workflow, including the Conventional Commits PR-title convention used to drive automatic version bumps and changelog updates via release-please.

Design notes

The agent only sees the current page. No goal-page summary, no link list — just the title of where it is, the title of where it's going, and (via get_content) the body of the page it's currently on. This mirrors how a human plays and makes results comparable across runs and models.

Self-contained MediaWiki client. The popular wikipedia PyPI package is unmaintained and crashes with JSONDecodeError when Wikipedia rate-limits it (it tries to parse the HTML error page as JSON). wiki_client.py sets a real User-Agent, retries transient failures, raises a clear error on non-JSON responses, and caches pages in-process.

One agent loop, two modes. The agent makes one model call per turn, alternating a forced get_content on each new page with a move_page call (reasoning text and the tool call come back in one response). On a successful move the message history is rebuilt from scratch. Use --proxy-reasoning to split the move turn into a separate reason + act pair for models without native reasoning. Use --notes to additionally carry a compact textual record of each prior move's reasoning across transitions, so the model can see why it picked each prior page rather than just where it ended up.

Layout

src/wikigame_agent/
  wiki_client.py   # async MediaWiki client (the JSONDecodeError fix lives here)
  game.py          # WikiGame, WikiGameRules
  tools.py         # get_content, move_page, check_path
  prompts.py       # system / on-page / next-step / step
  agents.py        # wiki_agent (the single agent loop)
  display.py       # Rich-based turn-by-turn console output
  cli.py           # `wikigame play ...`, `wikigame view`
  config.py        # pydantic-settings, reads .env

Credits

Original exercise from ARENA 3.0, Chapter 3.4 (LLM Agents) by Callum McDougall and contributors.

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.

License

Apache License 2.0. Contributions are accepted under the same license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikigame_agent-0.7.1.tar.gz (212.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikigame_agent-0.7.1-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file wikigame_agent-0.7.1.tar.gz.

File metadata

  • Download URL: wikigame_agent-0.7.1.tar.gz
  • Upload date:
  • Size: 212.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wikigame_agent-0.7.1.tar.gz
Algorithm Hash digest
SHA256 1052dc992777a9e69f6c5683a495446ac956588d82e3df172f73f9d562a85a83
MD5 2ee80244bc100cdf44d48250317b2fed
BLAKE2b-256 d0382081e6321b1a2d598ab199e2418578226501a9244e8f136f72f8ec4e8928

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikigame_agent-0.7.1.tar.gz:

Publisher: publish.yml on yarv/wikigame-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikigame_agent-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: wikigame_agent-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wikigame_agent-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 79caee4ea8742343ed1934ae24ebc126c7b9afd8133cea3241d65efbc617a426
MD5 02c9d3156777655005caddd0b6a663cc
BLAKE2b-256 ff96d2d46cc812c93c62f815c54409552201c7ffcd8f544f0f15452ad50d4fe3

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikigame_agent-0.7.1-py3-none-any.whl:

Publisher: publish.yml on yarv/wikigame-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page