Skip to main content

MCP server providing AI-powered access to Google CodeWiki for open-source repositories

Project description

CodeWiki MCP Server

A Model Context Protocol (MCP) server that provides AI-powered access to Google CodeWiki for open-source repositories. Query any GitHub, GitLab, or Bitbucket repository and get instant, AI-generated answers about the codebase — all from your editor.

Features

  • Playwright-powered rendering — CodeWiki is an Angular SPA; all page content is rendered via headless Chromium
  • 4 MCP tools — topics overview, structure (JSON TOC), full/section content, interactive Gemini chat Q&A
  • Structured diagram extraction — detects CodeWiki SPA diagrams, Mermaid blocks, and inline SVGs; parses Graphviz SVGs into entities and relationships
  • Shared browser singleton — persistent background event loop with lazy Chromium launch, shared across all tools
  • In-memory caching — TTLCache avoids redundant page renders (wiki pages cached for 5 min)
  • Dual-strategy parser — handles CodeWiki's custom Angular elements (<body-content-section>, <documentation-markdown>) with fallback to standard HTML
  • Pydantic input validation — schema-based validation with clear error messages
  • Structured JSON responses{status, code, message, data, meta} envelope
  • URL normalization — accepts owner/repo shorthand or full URLs
  • Environment variable configuration — override timeouts, retries, cache TTL, and limits
  • Multi-transport — stdio (default) or SSE
  • Graceful shutdown — SIGINT/SIGTERM handlers for clean Ctrl+C
  • Modular architecture — tools in separate modules, easy to extend
  • Response metadata — timing, char count, attempt info in every response
  • Docker support — Dockerfile with Playwright included

Prerequisites

  • Python 3.10+
  • Playwright Chromium (playwright install chromium)

Installation

Option A — Install from PyPI (recommended)

pip install codewiki-mcp
playwright install chromium

That's it. You now have the codewiki-mcp command available globally:

codewiki-mcp                    # stdio (default)
codewiki-mcp --sse --port 8080  # SSE transport
codewiki-mcp --verbose           # debug logging

Option B — Install from source

git clone https://github.com/Cloudmeru/CodeWiki-MCP.git
cd CodeWiki-MCP
pip install .
playwright install chromium

For development (with test dependencies):

pip install -e ".[test]"

Option C — Docker

docker build -t codewiki-mcp .

# stdio (for MCP clients)
docker run -it --rm codewiki-mcp

# SSE (for HTTP access)
docker run -p 3000:3000 codewiki-mcp --sse --port 3000

# With custom config
docker run -e CODEWIKI_MAX_RETRIES=5 -e CODEWIKI_VERBOSE=true codewiki-mcp

Configuration

Environment Variables

Variable Default Description
CODEWIKI_HARD_TIMEOUT 60 Hard timeout per request (seconds)
CODEWIKI_HTTPX_TIMEOUT 30 HTTP timeout fallback (seconds)
CODEWIKI_PAGE_LOAD_TIMEOUT 30 Playwright page load timeout (seconds)
CODEWIKI_ELEMENT_WAIT_TIMEOUT 20 Element wait timeout (seconds)
CODEWIKI_RESPONSE_WAIT_TIMEOUT 45 Chat response wait timeout (seconds)
CODEWIKI_MAX_RETRIES 2 Max retry attempts
CODEWIKI_RETRY_DELAY 3 Delay between retries (seconds)
CODEWIKI_RESPONSE_MAX_CHARS 30000 Max response character count
CODEWIKI_CACHE_TTL 300 Page cache TTL (seconds)
CODEWIKI_CACHE_MAX_SIZE 50 Max pages in cache
CODEWIKI_RESPONSE_INITIAL_DELAY 5 Initial delay before polling chat response (seconds)
CODEWIKI_RESPONSE_POLL_INTERVAL 2 Interval between chat response polls (seconds)
CODEWIKI_RESPONSE_STABLE_INTERVAL 2 Stable response detection interval (seconds)
CODEWIKI_JS_LOAD_DELAY 3 Delay for JS/SPA loading (seconds)
CODEWIKI_VERBOSE false Enable debug logging
CODEWIKI_BASE_URL https://codewiki.google CodeWiki base URL

MCP Client Setup

VS Code (.vscode/mcp.json):

{
  "mcpServers": {
    "codewiki": {
      "command": "codewiki-mcp"
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "codewiki": {
      "command": "codewiki-mcp"
    }
  }
}

CLI Options

codewiki-mcp [--stdio | --sse] [--port PORT] [--verbose | -v]
Flag Description
--stdio Run with stdio transport (default)
--sse Run with SSE transport
--port PORT Port for SSE transport (default: 3000)
--verbose, -v Enable debug logging

Tools

list_code_wiki_topics

Retrieve the overview and available topics for a repository. Returns the full wiki content as markdown.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand

Renders the page via Playwright, caches result for 5 minutes.

read_wiki_structure

Get a JSON list of documentation sections/topics for a repository. Use this to see what's available before reading specific sections.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand

Renders the page via Playwright, caches result for 5 minutes.

read_wiki_contents

View full or section-specific documentation. Without section_title, returns the full wiki (may be truncated). With section_title, returns just that section.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand
section_title string No Title (or partial) of a section to retrieve

Renders the page via Playwright, caches result for 5 minutes.

search_code_wiki

Ask Google CodeWiki a question about an open-source repository. Uses the interactive Gemini-powered chat.

Parameter Type Required Description
repo_url string Yes Full URL or owner/repo shorthand
query string Yes The question to ask

Opens a new browser context, interacts with the chat panel, waits for the streamed Gemini response.

Examples:

repo_url: https://github.com/microsoft/vscode-copilot-chat
query: Where are the Allow/Skip buttons implemented?

repo_url: microsoft/vscode
query: How does the extension activation work?

Response Format

All tools return structured JSON:

{
  "status": "ok",
  "data": "The response content...",
  "repo_url": "https://github.com/owner/repo",
  "query": "How does X work?",
  "meta": {
    "elapsed_ms": 450,
    "char_count": 3200,
    "attempt": 1,
    "max_attempts": 2,
    "truncated": false
  }
}

Error responses:

{
  "status": "error",
  "code": "TIMEOUT",
  "message": "Page took too long to load.",
  "repo_url": "https://github.com/owner/repo",
  "meta": { "elapsed_ms": 60000, "attempt": 2, "max_attempts": 2 }
}

Error codes: VALIDATION, TIMEOUT, DRIVER_ERROR, NO_CONTENT, INPUT_NOT_FOUND, INTERNAL, RETRY_EXHAUSTED

Project Structure

codewiki_mcp/
├── __init__.py        # Package init + version
├── __main__.py        # python -m entry point
├── browser.py         # Shared Playwright browser singleton + persistent event loop
├── cache.py           # TTLCache for rendered pages
├── config.py          # Env-var-driven configuration + SPA selectors
├── parser.py          # Playwright renderer + BeautifulSoup section parser
├── server.py          # MCP server setup + CLI
├── types.py           # Pydantic schemas + response models
└── tools/
    ├── __init__.py    # Tool registration
    ├── contents.py    # read_wiki_contents
    ├── search.py      # search_code_wiki (Playwright chat interaction)
    ├── structure.py   # read_wiki_structure
    └── topics.py      # list_code_wiki_topics
tests/
├── __init__.py        # Package marker
├── conftest.py        # Shared fixtures + sample data
├── test_cache.py      # Cache layer tests
├── test_config.py     # Configuration tests
├── test_parser.py     # Parser + HTML extraction tests
├── test_tools.py      # Server, tools & integration tests
└── test_types.py      # Schema & response tests
Dockerfile             # Docker deployment

Running Tests

pip install -e ".[test]"
pytest tests/ -v

Architecture

v1.0.0 — Playwright-everywhere

CodeWiki is an Angular SPA (<sdlc-agents-root>) — a plain HTTP GET returns an empty body. All page content is rendered client-side, so every tool uses Playwright headless Chromium.

Browser singleton

A persistent background event loop runs in a daemon thread. browser.py provides:

  • _get_browser() — lazily launches a shared Chromium instance
  • run_in_browser_loop(coro) — submits async work to the persistent loop from any sync context
  • fetch_rendered_html(url) — navigates, waits for SPA content markers, returns the rendered HTML

All 4 tools share the same browser instance. Pages are cached in a TTLCache (5 min) so repeated requests skip the render.

SPA-aware parser

CodeWiki uses custom Angular elements instead of standard HTML:

  • <body-content-section> — one per wiki section
  • <documentation-markdown> — rendered markdown inside each section
  • <chat><new-message-form><textarea data-test-id="chat-input"> — the Gemini chat

parser.py implements a dual-strategy section extractor:

  1. CodeWiki SPA — looks for <body-content-section> + <documentation-markdown> elements
  2. Standard HTML fallback — scans h1-h6 headings for non-CodeWiki pages

Structured diagram extraction

CodeWiki renders diagrams as <code-documentation-diagram-inline> elements containing base64-encoded SVGs. The parser detects three types of diagrams:

  1. CodeWiki SPA diagrams — decodes data:image/svg+xml;base64,... from <image class="image-diagram"> elements, then parses Graphviz SVG structure
  2. Mermaid blocks — captures raw source from <code class="mermaid"> and <div class="mermaid">
  3. Fallback SVGs/images — bare <svg> with <title> or <img> matching diagram patterns

For Graphviz SVGs, it extracts structured graph data:

  • Nodes<g class="node"> groups → {id, label}
  • Edges<g class="edge"> groups → {from, to, label}

Diagram summaries (entities + relationships) are placed at the top of tool output so they remain visible even when responses are truncated.

Key components

  • Playwright + shared browser — all page rendering via headless Chromium
  • TTLCache — rendered pages cached for 5 minutes (configurable)
  • BeautifulSoup + lxml — fast HTML parsing with section extraction, TOC, and diagram detection
  • Pydantic schemas validate all inputs before processing
  • Structured responses with JSON envelope and metadata
  • Modular tools — each tool in its own module, registered via register_all_tools()
  • Signal handlers — SIGINT/SIGTERM for clean shutdown with browser cleanup
  • Environment variable configuration — all tunables configurable without code changes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codewiki_mcp-1.0.0.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codewiki_mcp-1.0.0-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file codewiki_mcp-1.0.0.tar.gz.

File metadata

  • Download URL: codewiki_mcp-1.0.0.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codewiki_mcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4c2312fc304315bd6cf4e6e6d966a346d230ab7be64896d3238cd1c67564c595
MD5 7d86c0808ac75bc8e33d56db59b5203d
BLAKE2b-256 e6b3ca274c2cb4048b215a68c418169286ba639a265b1865c8587482bb81d10b

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewiki_mcp-1.0.0.tar.gz:

Publisher: publish.yml on Cloudmeru/CodeWiki-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codewiki_mcp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: codewiki_mcp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codewiki_mcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ea2d0ae076a8b283ba9ac16213dcb99687cb58952c6d716716a492bbc1ae6b1
MD5 64daf381300b8a29ef56852b48217d33
BLAKE2b-256 b196a52373cd42197cc48b95b0fb353b8b6c4d31409297b9f6d65a746fc9a81b

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewiki_mcp-1.0.0-py3-none-any.whl:

Publisher: publish.yml on Cloudmeru/CodeWiki-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page