Skip to main content

MCP server providing AI-powered access to Google CodeWiki for open-source repositories

Project description

CodeWiki MCP Server

A Model Context Protocol (MCP) server that provides AI-powered access to Google CodeWiki for open-source repositories. Query any GitHub, GitLab, or Bitbucket repository and get instant, AI-generated answers about the codebase — all from your editor.

Features

  • Playwright-powered rendering — CodeWiki is an Angular SPA; all page content is rendered via headless Chromium
  • 4 MCP tools — topics overview, structure (JSON TOC), full/section content, interactive Gemini chat Q&A
  • Shared browser singleton — persistent background event loop with lazy Chromium launch, shared across all tools
  • In-memory caching — TTLCache avoids redundant page renders (wiki pages cached for 5 min)
  • Dual-strategy parser — handles CodeWiki's custom Angular elements (<body-content-section>, <documentation-markdown>) with fallback to standard HTML
  • Pydantic input validation — schema-based validation with clear error messages
  • Structured JSON responses{status, code, message, data, meta} envelope
  • URL normalization — accepts owner/repo shorthand or full URLs
  • Environment variable configuration — override timeouts, retries, cache TTL, and limits
  • Multi-transport — stdio (default) or SSE
  • Modular architecture — tools in separate modules, easy to extend
  • Response metadata — timing, char count, attempt info in every response
  • Docker support — Dockerfile with Playwright included

Prerequisites

  • Python 3.10+
  • Playwright Chromium (playwright install chromium)

Installation

Option A — Install as a CLI command (recommended)

pip install .
playwright install chromium

# Now you can run:
codewiki-mcp
codewiki-mcp --sse --port 8080
codewiki-mcp --verbose

Option B — Install dependencies only

pip install mcp pydantic httpx beautifulsoup4 lxml playwright cachetools
playwright install chromium

python -m codewiki_mcp

Option C — Build a standalone .exe

pip install ".[build]"
python build_exe.py
# → dist/codewiki-mcp.exe (Windows) or dist/codewiki-mcp (macOS/Linux)

Note: The .exe still requires Playwright Chromium on the target machine for the chat tool.

Option D — Docker

docker build -t codewiki-mcp .

# stdio (for MCP clients)
docker run -it --rm codewiki-mcp

# SSE (for HTTP access)
docker run -p 3000:3000 codewiki-mcp --sse --port 3000

# With custom config
docker run -e CODEWIKI_MAX_RETRIES=5 -e CODEWIKI_VERBOSE=true codewiki-mcp

Configuration

Environment Variables

Variable Default Description
CODEWIKI_HARD_TIMEOUT 60 Hard timeout per request (seconds)
CODEWIKI_HTTPX_TIMEOUT 30 HTTP timeout fallback (seconds)
CODEWIKI_PAGE_LOAD_TIMEOUT 30 Playwright page load timeout (seconds)
CODEWIKI_ELEMENT_WAIT_TIMEOUT 20 Element wait timeout (seconds)
CODEWIKI_RESPONSE_WAIT_TIMEOUT 45 Chat response wait timeout (seconds)
CODEWIKI_MAX_RETRIES 2 Max retry attempts
CODEWIKI_RETRY_DELAY 3 Delay between retries (seconds)
CODEWIKI_RESPONSE_MAX_CHARS 8000 Max response character count
CODEWIKI_CACHE_TTL 300 Page cache TTL (seconds)
CODEWIKI_CACHE_MAX_SIZE 50 Max pages in cache
CODEWIKI_VERBOSE false Enable debug logging
CODEWIKI_BASE_URL https://codewiki.google CodeWiki base URL

MCP Client Setup

VS Code (.vscode/mcp.json):

{
  "mcpServers": {
    "codewiki": {
      "command": "codewiki-mcp"
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "codewiki": {
      "command": "codewiki-mcp"
    }
  }
}

CLI Options

codewiki-mcp [--stdio | --sse] [--port PORT] [--verbose | -v]
Flag Description
--stdio Run with stdio transport (default)
--sse Run with SSE transport
--port PORT Port for SSE transport (default: 3000)
--verbose, -v Enable debug logging

Tools

list_code_wiki_topics

Retrieve the overview and available topics for a repository. Returns the full wiki content as markdown.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand

Renders the page via Playwright, caches result for 5 minutes.

read_wiki_structure

Get a JSON list of documentation sections/topics for a repository. Use this to see what's available before reading specific sections.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand

Renders the page via Playwright, caches result for 5 minutes.

read_wiki_contents

View full or section-specific documentation. Without section_title, returns the full wiki (may be truncated). With section_title, returns just that section.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand
section_title string No Title (or partial) of a section to retrieve

Renders the page via Playwright, caches result for 5 minutes.

search_code_wiki

Ask Google CodeWiki a question about an open-source repository. Uses the interactive Gemini-powered chat.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand
query string Yes The question to ask

Opens a new browser context, interacts with the chat panel, waits for the streamed Gemini response.

Examples:

repo_url: https://github.com/microsoft/vscode-copilot-chat
query: Where are the Allow/Skip buttons implemented?

repo_url: microsoft/vscode
query: How does the extension activation work?

Response Format

All tools return structured JSON:

{
  "status": "ok",
  "data": "The response content...",
  "repo_url": "https://github.com/owner/repo",
  "query": "How does X work?",
  "meta": {
    "elapsed_ms": 450,
    "char_count": 3200,
    "attempt": 1,
    "max_attempts": 2,
    "truncated": false
  }
}

Error responses:

{
  "status": "error",
  "code": "TIMEOUT",
  "message": "Page took too long to load.",
  "repo_url": "https://github.com/owner/repo",
  "meta": { "elapsed_ms": 60000, "attempt": 2, "max_attempts": 2 }
}

Error codes: VALIDATION, TIMEOUT, DRIVER_ERROR, NO_CONTENT, INPUT_NOT_FOUND, INTERNAL, RETRY_EXHAUSTED

Project Structure

codewiki_mcp/
├── __init__.py        # Package init + version
├── __main__.py        # python -m entry point
├── browser.py         # Shared Playwright browser singleton + persistent event loop
├── cache.py           # TTLCache for rendered pages
├── config.py          # Env-var-driven configuration + SPA selectors
├── driver.py          # Deprecated Selenium shim (no-op)
├── parser.py          # Playwright renderer + BeautifulSoup section parser
├── server.py          # MCP server setup + CLI
├── types.py           # Pydantic schemas + response models
└── tools/
    ├── __init__.py    # Tool registration
    ├── contents.py    # read_wiki_contents
    ├── search.py      # search_code_wiki (Playwright chat interaction)
    ├── structure.py   # read_wiki_structure
    └── topics.py      # list_code_wiki_topics
tests/
├── conftest.py        # Shared fixtures + sample data
├── test_cache.py      # Cache layer tests
├── test_config.py     # Configuration tests
├── test_parser.py     # Parser + HTML extraction tests
├── test_tools.py      # Server, tools & integration tests
└── test_types.py      # Schema & response tests
Dockerfile             # Docker deployment

Running Tests

pip install -e ".[test]"
pytest tests/ -v

Architecture

v0.3.0 — Playwright-everywhere

CodeWiki is an Angular SPA (<sdlc-agents-root>) — a plain HTTP GET returns an empty body. All page content is rendered client-side, so every tool uses Playwright headless Chromium.

Browser singleton

A persistent background event loop runs in a daemon thread. browser.py provides:

  • _get_browser() — lazily launches a shared Chromium instance
  • run_in_browser_loop(coro) — submits async work to the persistent loop from any sync context
  • fetch_rendered_html(url) — navigates, waits for SPA content markers, returns the rendered HTML

All 4 tools share the same browser instance. Pages are cached in a TTLCache (5 min) so repeated requests skip the render.

SPA-aware parser

CodeWiki uses custom Angular elements instead of standard HTML:

  • <body-content-section> — one per wiki section
  • <documentation-markdown> — rendered markdown inside each section
  • <chat><new-message-form><textarea data-test-id="chat-input"> — the Gemini chat

parser.py implements a dual-strategy section extractor:

  1. CodeWiki SPA — looks for <body-content-section> + <documentation-markdown> elements
  2. Standard HTML fallback — scans h1-h6 headings for non-CodeWiki pages

Key components

  • Playwright + shared browser — all page rendering via headless Chromium
  • TTLCache — rendered pages cached for 5 minutes (configurable)
  • BeautifulSoup + lxml — fast HTML parsing with section extraction, TOC, and diagram detection
  • Pydantic schemas validate all inputs before processing
  • Structured responses with JSON envelope and metadata
  • Modular tools — each tool in its own module, registered via register_all_tools()
  • Environment variable configuration — all tunables configurable without code changes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codewiki_mcp-0.3.0.tar.gz (31.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codewiki_mcp-0.3.0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file codewiki_mcp-0.3.0.tar.gz.

File metadata

  • Download URL: codewiki_mcp-0.3.0.tar.gz
  • Upload date:
  • Size: 31.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codewiki_mcp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 78bcbc51373774efcb49418b90185e93a8a4636baeccae0a3437c004e4611b70
MD5 bbfc86795c3c00c01fda238f97b24134
BLAKE2b-256 49e58d148386157609c2ab5c6ddc1d3927bfd911d440c94a130d859b930243f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewiki_mcp-0.3.0.tar.gz:

Publisher: publish.yml on Cloudmeru/CodeWiki-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codewiki_mcp-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: codewiki_mcp-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codewiki_mcp-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d4193cf7eb8cb9909df0574420a6bd1b1caf36d8278aeaccf826e84478dc7da
MD5 f000e3429115af30704885421a6d18af
BLAKE2b-256 81b76942e8a7d566a6f810bc8010d085e4a4695938070395d78bdbf415b94153

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewiki_mcp-0.3.0-py3-none-any.whl:

Publisher: publish.yml on Cloudmeru/CodeWiki-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page