LLM-powered wiki generator for any codebase
Project description
wikigen
LLM-powered wiki generator for any codebase — structured, interlinked Markdown notes that survive context window limits.
Inspired by Karpathy's LLM Wiki concept — wikigen is a general-purpose CLI tool that points at any project directory and generates a rich, interlinked Markdown wiki from your codebase.
Architecture
wikigen/
│
├── cli.py ← Click entry point — routes all 4 commands
│ │
│ ├── config.py ← WikigenConfig dataclasses, YAML load/save
│ │
│ ├── ingester.py ← Full ingest pipeline orchestrator
│ │ ├── collector.py walk · chunk · prioritise source files
│ │ ├── cache.py SHA-256 hash store (.wikigen_cache.json)
│ │ ├── writer.py Markdown output + [[wikilink]] conversion
│ │ ├── backends/ LLM abstraction layer
│ │ │ ├── Claude Anthropic SDK
│ │ │ ├── OpenAI openai SDK (or any compatible endpoint)
│ │ │ └── Ollama local via httpx REST
│ │ └── prompts/ all system + user prompt builders
│ │
│ ├── updater.py ← Incremental re-processing (changed files only)
│ │ └── (reuses collector, cache, backends, prompts)
│ │
│ └── linter.py ← Broken [[WikiLinks]], orphan detection, CI exit code
│
tests/
└── test_wikigen.py 22 tests, zero LLM calls required
Data flow during wikigen ingest:
project/ collector.py ingester.py backends/
source files → walk + chunk → context summary → LLM call
SHA-256 hash section plan (parallel)
page generation
↓
writer.py → wiki/*.md
cache.py → .wikigen_cache.json
Use with AI coding agents (Claude Code, Cursor, Copilot, etc.)
wikigen is designed to be invoked directly by coding agents that have shell access. No interactive prompts, no confirmations — every command is fully scriptable.
Claude Code already has ANTHROPIC_API_KEY in its environment (the same key it uses for its own reasoning). This means wikigen's Claude backend picks it up automatically — no separate key setup needed inside a Claude Code session.
A Claude Code agent can self-document any project it's working in with a single tool call:
# Agent drops this into the project shell — zero config required
cd /path/to/project
pip install "wikigen-cli[claude]" -q
wikigen init
wikigen ingest
After that, the agent (or you) can run wikigen update after every significant change, keeping the wiki in sync as the codebase evolves. The wiki then becomes persistent structured context the agent can read back in future sessions — surviving the context window limit that would otherwise force it to re-read the whole codebase each time.
Other agents (Cursor, Aider, Copilot Workspace, etc.) work the same way as long as they have an OPENAI_API_KEY or can reach a local Ollama instance:
# OpenAI-backed agent
wikigen --backend openai ingest
# Fully local, no keys
wikigen --backend ollama ingest
All commands exit with code 0 on success and non-zero on error, making them composable in agent tool-call loops and CI pipelines.
Why wikigen?
Large codebases exceed the context window of any LLM. Wikigen solves this by:
- Chunking your entire codebase into LLM-sized windows.
- Using an LLM to synthesise structured wiki pages — not just summaries, but architecture notes, module docs, data-model refs, and more.
- Writing interlinked Markdown so you can navigate your knowledge graph.
- Tracking file hashes so only changed files are re-processed on
wikigen update.
The resulting wiki lives next to your code, is committed to git, and stays fresh automatically.
Installation
# Core (no backend pre-installed)
pip install wikigen-cli
# With Claude (Anthropic) support
pip install "wikigen-cli[claude]"
# With OpenAI support
pip install "wikigen-cli[openai]"
# Everything
pip install "wikigen-cli[all]"
Requires Python ≥ 3.11.
Use without PyPI (local / development)
You don't need to publish to PyPI to use wikigen as a real CLI tool. An editable install reads directly from your local source:
git clone https://github.com/your-org/wikigen
cd wikigen
pip install -e ".[claude]" # registers the `wikigen` command system-wide
wikigen --version # works immediately
Any edits to the source are reflected instantly — no reinstall needed.
Quick start
cd my-project
# 1. Scaffold config
wikigen init
# 2. Set your API key
export ANTHROPIC_API_KEY=sk-ant-...
# 3. Generate wiki
wikigen ingest
# 4. Browse your wiki
ls wiki/
Commands
wikigen init
Creates a wikigen.yaml in the project root. Edit it to configure:
- Which LLM backend to use (
claude,openai, orollama) - Which files to include/exclude
- What wiki sections to generate
wikigen init
wikigen ingest
Reads the entire codebase and generates the wiki from scratch.
wikigen ingest # normal run
wikigen ingest --force # regenerate even cached pages
wikigen ingest --dry-run # preview what would be generated
wikigen ingest --concurrency 8 # parallel LLM requests
Pipeline:
- Walk project tree → collect source files
- Read priority files (CLAUDE.md, README, schema) → build project context summary
- Ask LLM to plan wiki structure (sections → page titles)
- Generate each page in parallel, injecting relevant source chunks as context
- Write interlinked Markdown to
wiki/ - Store SHA-256 hashes in
wiki/.wikigen_cache.json
wikigen update
Re-processes only files that changed since the last run.
wikigen update
wikigen update --dry-run
Detects:
- Changed files (hash mismatch) → re-generates affected wiki pages
- Deleted files → removes cache entries
wikigen lint
Validates all wiki pages for consistency.
wikigen lint # report issues, exit 1 if any found
wikigen lint --fix # auto-fix trivial issues (e.g. add missing front matter stubs)
Checks:
[[WikiLinks]]that don't resolve to an existing page[text](path.md)links pointing to missing files- Pages that are never linked from anywhere (orphans)
- Missing YAML front matter
Useful in CI:
# .github/workflows/wiki.yml
- run: wikigen lint
Configuration reference (wikigen.yaml)
project_name: "my-project"
backend:
name: "claude" # claude | openai | ollama
model: "claude-sonnet-4-20250514"
api_key_env: "ANTHROPIC_API_KEY"
# base_url: "http://localhost:11434" # for Ollama
max_tokens: 4096
temperature: 0.2
ingestion:
include_patterns: ["**/*"]
exclude_patterns:
- "**/.git/**"
- "**/node_modules/**"
- "**/__pycache__/**"
max_file_size_kb: 256
chunk_size_tokens: 6000
chunk_overlap_tokens: 200
wiki:
sections:
- Overview
- Architecture
- Modules
- Data Models
- API Reference
- Configuration
- Development Guide
index_page: "Home"
link_style: "wikilink" # wikilink ([[Page]]) or markdown ([Page](Page.md))
front_matter: true
Backends
Claude (Anthropic)
pip install "wikigen-cli[claude]"
export ANTHROPIC_API_KEY=sk-ant-...
backend:
name: claude
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
OpenAI
pip install "wikigen-cli[openai]"
export OPENAI_API_KEY=sk-...
backend:
name: openai
model: gpt-4o
api_key_env: OPENAI_API_KEY
Also works with any OpenAI-compatible API (Together, Groq, Azure, etc.) by setting base_url.
Ollama (local)
ollama pull llama3
backend:
name: ollama
model: llama3
base_url: "http://localhost:11434"
No API key required. All processing stays on your machine.
Wiki structure
After wikigen ingest, your wiki looks like:
wiki/
├── home.md # index page with full ToC
├── .wikigen_cache.json # hash cache (commit this)
├── architecture/
│ ├── system-overview.md
│ ├── request-lifecycle.md
│ └── data-flow.md
├── modules/
│ ├── auth-module.md
│ └── payment-module.md
├── data-models/
│ ├── user-model.md
│ └── order-model.md
└── ...
Each page has YAML front matter:
---
title: RequestLifecycle
description: How HTTP requests flow through the system.
tags: [architecture, http, middleware]
related: [SystemOverview, AuthModule]
---
And uses [[WikiLinks]] for cross-references (compatible with Obsidian, Foam, Logseq, etc.).
Global options
wikigen --project-dir /path/to/project ingest
wikigen --wiki-dir /custom/wiki/path ingest
wikigen --backend openai ingest # override config backend
Development
git clone https://github.com/your-org/wikigen
cd wikigen
pip install -e ".[dev]"
# Run tests (no API key needed — all LLM calls are unit-tested without network)
pytest
# Lint
ruff check wikigen/
mypy wikigen/
Roadmap
-
wikigen serve— local web UI for browsing the wiki - GitHub Actions integration template
- Embeddings-based chunk retrieval for better relevance
- Support for multi-modal (diagrams via GPT-4V / Claude Vision)
-
wikigen diff— show what changed between two wiki generations - MkDocs / Docusaurus export
License
MIT © wikigen contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikigen_cli-0.1.0.tar.gz.
File metadata
- Download URL: wikigen_cli-0.1.0.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71a45c0321021acd12317a5ba23b2776e41606664ab2c31e5aebfd4f60838976
|
|
| MD5 |
e2952280868e6a11f45324322abc4eae
|
|
| BLAKE2b-256 |
f73cc3912d7a2ff5d3dd93b94e635d335013ac92f3a42684d562817326068bed
|
Provenance
The following attestation bundles were made for wikigen_cli-0.1.0.tar.gz:
Publisher:
publish.yml on birangdev/WikiGen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wikigen_cli-0.1.0.tar.gz -
Subject digest:
71a45c0321021acd12317a5ba23b2776e41606664ab2c31e5aebfd4f60838976 - Sigstore transparency entry: 1585676986
- Sigstore integration time:
-
Permalink:
birangdev/WikiGen@e040830417cff20adf93c5664004ee5690a06e29 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/birangdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e040830417cff20adf93c5664004ee5690a06e29 -
Trigger Event:
push
-
Statement type:
File details
Details for the file wikigen_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: wikigen_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cea3c557f5b93de8d3635534a7081cb6291dd2967e1ead6d8c88da09a735095c
|
|
| MD5 |
82453399da9be3ea27335b0473e88e3a
|
|
| BLAKE2b-256 |
d248430b0e054d40ebb7349ab0131edb933fb95acf637734067e5426a2dabd9a
|
Provenance
The following attestation bundles were made for wikigen_cli-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on birangdev/WikiGen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wikigen_cli-0.1.0-py3-none-any.whl -
Subject digest:
cea3c557f5b93de8d3635534a7081cb6291dd2967e1ead6d8c88da09a735095c - Sigstore transparency entry: 1585677088
- Sigstore integration time:
-
Permalink:
birangdev/WikiGen@e040830417cff20adf93c5664004ee5690a06e29 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/birangdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e040830417cff20adf93c5664004ee5690a06e29 -
Trigger Event:
push
-
Statement type: