Skip to main content

LLM Wiki — Auto-generate knowledge base from code & docs. CGC code intelligence, Q&A chat, knowledge graph, drift detection.

Project description

llm-wiki

Karpathy-style LLM Wiki compiler as a proper Python library.

Ingests raw markdown sources into a structured interlinked wiki with:

  • Sources / Entities / Concepts hierarchy
  • 4-tier fuzzy dedup for concept pages
  • Async Gemini client with retry + rate-limit handling
  • Robust JSON parser for LLM output (3-tier recovery)
  • Post-processing fix_links for broken wikilinks

Install

pip install -e ".[dev]"

Quick start

export GEMINI_API_KEY=your-key

# Init vault
mkdir my-wiki && cd my-wiki
mkdir raw wiki
echo "# SkyJoy is a loyalty program" > raw/intro.md

# Ingest
llm-wiki ingest raw/intro.md --vault .

# Query
llm-wiki query "What is SkyJoy?" --vault .

# Fix broken wikilinks
llm-wiki fix-links --vault . --apply

Code-to-Doc (ingest-code)

Generate wiki documentation directly from source code:

# Basic: Python-only call graph
llm-wiki ingest-code ./my-project --vault .

# With CGC: 19-language call graph + operational params extraction
pip install codegraphcontext kuzu
llm-wiki ingest-code ./my-project --vault . --cgc

# Incremental update (only changed modules)
llm-wiki ingest-code ./my-project --vault . --cgc --update

# Serve wiki with Q&A chat
llm-wiki serve --vault . --port 5757

CGC augmentation (--cgc)

When enabled, CodeGraphContext (MIT license) provides:

Feature Without --cgc With --cgc
Call graph Python-only (~200 edges) 19 languages, 1400+ edges
Source budget 8,000 chars/module 16,000 chars/module
Operational params Not extracted Cron schedules, timeouts, constants, env vars
Class hierarchy Not available Parents, children, methods

CGC auto-indexes on first run (~2 min for 300 files). Index is cached in .cgc_index/.

Programmatic API

import asyncio
from pathlib import Path

from llm_wiki import GeminiClient, Settings, Vault, ingest_source


async def main():
    settings = Settings(vault_root=Path("./my-wiki"))
    vault = Vault(settings.vault_root)
    llm = GeminiClient(api_key=settings.gemini_api_key)
    try:
        result = await ingest_source(
            source=vault.raw / "intro.md",
            vault=vault,
            llm=llm,
            settings=settings,
        )
        print(f"Ingested: {result.title}")
        print(f"Tokens: {result.input_tokens} in / {result.output_tokens} out")
    finally:
        await llm.close()

asyncio.run(main())

Testing

make install        # pip install -e ".[dev]"
make test           # unit tests (fast, no network)
make test-integration   # real LLM calls (requires GEMINI_API_KEY)
make lint           # ruff + mypy

Architecture

src/llm_wiki/
├── config.py           Settings via pydantic-settings
├── models.py           Pydantic models (IngestResult, Concept, Entity)
├── exceptions.py       Custom exceptions
├── dedup.py            4-tier fuzzy matching
├── json_parser.py      3-tier robust JSON parser
├── fix_links.py        Post-process broken wikilinks
├── llm/
│   ├── base.py         LLMProvider ABC
│   └── gemini.py       Async Gemini client
├── vault/
│   ├── paths.py        Vault path resolution
│   ├── frontmatter.py  YAML frontmatter parsing
│   └── wikilinks.py    Wikilink extraction
├── pipeline/
│   ├── ingest.py       ingest_source() — 1-shot ingest
│   ├── query.py        query_wiki() — Q&A against wiki
│   └── lint.py         lint_vault() — structural + semantic
└── cli.py              Typer CLI

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiki_forge-1.0.0.tar.gz (74.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wiki_forge-1.0.0-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file wiki_forge-1.0.0.tar.gz.

File metadata

  • Download URL: wiki_forge-1.0.0.tar.gz
  • Upload date:
  • Size: 74.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for wiki_forge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7d0b0618d2234b3948361a6a4af399bdd8a6100857cdf2283110e6387711848f
MD5 01affb714dbf95455d4be235fa4d767d
BLAKE2b-256 aa3e9448fafcf976d6718d9b0b4721d8052dc52ca197aad345043f6c80d1660b

See more details on using hashes here.

File details

Details for the file wiki_forge-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: wiki_forge-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for wiki_forge-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 154e2426c0cde6857aa35d3faa904ef0ad9d9dd4fa5b09e7362d15615ef13aaf
MD5 1f7307ce4e09813f5eaef5f74fd1aaf3
BLAKE2b-256 a9c618e6c1360c4f14157387e81d4f600aa1c6788907e1bd80ad2a74be19d051

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page