Skip to main content

Wiki Game LLM agent built on AISI Inspect.

Project description

wikigame-agent

CI License: Apache 2.0 Python 3.11+

An LLM agent that plays the Wikipedia game/Wikiracing (navigate from a starting page to a goal page using only links), built on AISI Inspect.

This started as a port of the Chapter 3.4 LLM Agents exercise from ARENA 3.0 into a self-contained project. The notable changes over the original notebook:

  • A custom MediaWiki client with a real User-Agent, exponential backoff retries, and a clear error when the API returns non-JSON. This eliminates the JSONDecodeErrors that came from Wikipedia silently rate-limiting the wikipedia PyPI package.
  • Three agent strategies (basic, react, history) selectable from the CLI.
  • tools.py with get_content, move_page, and check_path (the last one was unimplemented in the notebook).
  • A Rich-based per-turn console display so you can watch the game without spinning up the Inspect log viewer.

Setup

uv sync                       # create venv, install
cp .env.example .env          # then fill in OPENAI_API_KEY etc.

Play a game

uv run wikigame play "Canada" "Monty Python" \
  --model openai/gpt-5.4-nano --reasoning-effort medium

Options:

  • --agent {basic,react,history} — default react
  • --model openai/gpt-5.4-nano — overrides INSPECT_EVAL_MODEL
  • --reasoning-effort {none|minimal|low|medium|high|xhigh|max} — for o-series and gpt-5 models. The react/history agents rely on the model reasoning before each move; on a reasoning model that means setting this to at least low. On the OpenAI gpt-5 family the default is minimal, which produces no useful reasoning and the agent will flounder.
  • --proxy-reasoning — for models without native reasoning (e.g. gpt-4o-mini) or with reasoning effort set to minimal. Splits each move turn into a separate text-only reason call (forced tool_choice="none") followed by an act call, so the model's CoT shows up in plain text. Roughly doubles per-move model calls, so prefer a reasoning model when possible.
  • --message-limit 80 — Inspect aborts the run past this
  • --enable-check-path — adds the check_path dry-run tool
  • -v — debug logging

Each move prints a panel like:

╭─ Move 1: Canada  ->  British Empire ─╮
│ Path: Canada -> British Empire        │
╰───────────────────────────────────────╯

…and a final summary panel showing the full path and whether the goal was reached.

View Inspect logs

The CLI writes Inspect logs to ./logs/. To inspect them in the browser:

uv run wikigame view             # opens http://localhost:7575
# or equivalently:
uv run inspect view --log-dir logs

Development

make install        # uv sync --all-extras + installs pre-commit hooks
make check          # ruff lint + format check + pytest (everything CI runs)
make help           # list all targets

Tests use respx to mock the MediaWiki API — no network required.

See CONTRIBUTING.md for the full contributor workflow, including the Conventional Commits PR-title convention used to drive automatic version bumps and changelog updates via release-please.

Design notes

The agent only sees the current page. No goal-page summary, no link list — just the title of where it is, the title of where it's going, and (via get_content) the body of the page it's currently on. This mirrors how a human plays and makes results comparable across runs and models.

Self-contained MediaWiki client. The popular wikipedia PyPI package is unmaintained and crashes with JSONDecodeError when Wikipedia rate-limits it (it tries to parse the HTML error page as JSON). wiki_client.py sets a real User-Agent, retries transient failures, raises a clear error on non-JSON responses, and caches pages in-process.

Three agents, increasing in sophistication.

  • basic: tool-call loop. Resets message history on every successful move.
  • react: one model call per turn, alternating a forced get_content on each new page with a move_page call (reasoning text and the tool call come back in one response). Use --proxy-reasoning to split the move turn into a separate reason + act pair for models without native reasoning.
  • history: ReAct + carries a compact text record of prior moves across page transitions.

Layout

src/wikigame_agent/
  wiki_client.py   # async MediaWiki client (the JSONDecodeError fix lives here)
  game.py          # WikiGame, WikiGameRules
  tools.py         # get_content, move_page, check_path
  prompts.py       # system / on-page / next-step / step
  agents.py        # basic_agent, react_agent, history_agent
  display.py       # Rich-based turn-by-turn console output
  cli.py           # `wikigame play ...`, `wikigame view`
  config.py        # pydantic-settings, reads .env

Credits

Original exercise from ARENA 3.0, Chapter 3.4 (LLM Agents) by Callum McDougall and contributors.

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.

License

Apache License 2.0. Contributions are accepted under the same license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikigame_agent-0.6.1.tar.gz (206.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikigame_agent-0.6.1-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file wikigame_agent-0.6.1.tar.gz.

File metadata

  • Download URL: wikigame_agent-0.6.1.tar.gz
  • Upload date:
  • Size: 206.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wikigame_agent-0.6.1.tar.gz
Algorithm Hash digest
SHA256 cef79eec332d1a268672051dae47dca3cc8657af75a900ad5b92bb90d86622d3
MD5 1c963613954967428618e5778894dcd3
BLAKE2b-256 a2d5c139534762027e93e8451b87f8ced67dde626250b2c054df9b62ebb4d3e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikigame_agent-0.6.1.tar.gz:

Publisher: publish.yml on yarv/wikigame-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikigame_agent-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: wikigame_agent-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wikigame_agent-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7bafefe38b97346eb497145a369640952edbdb27ea1bd30a5b66ca595ed7dfb2
MD5 5580f375d378f2c0df4828b9f2f09a83
BLAKE2b-256 8fa88c08c16a7a6a8dc017e0111be7b86bbb49cdfb2728a90868857628cdd91e

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikigame_agent-0.6.1-py3-none-any.whl:

Publisher: publish.yml on yarv/wikigame-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page