Skip to main content

General-purpose LLM-Wiki CLI and Python library - Build persistent, LLM-maintained knowledge bases

Project description

llmwikify

Build persistent, LLM-maintained knowledge bases โ€” Based on Karpathy's LLM Wiki Principles

PyPI version Python 3.10+ License: MIT Tests: 199 passing


๐ŸŽฏ What is llmwikify?

llmwikify is a general-purpose LLM-Wiki management tool that helps you build and maintain a persistent knowledge base using LLMs. Unlike RAG systems that rediscover knowledge from scratch on every query, llmwikify incrementally builds and maintains a structured, interlinked wiki that compounds over time.

Core Philosophy

The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.

Based on Karpathy's LLM Wiki Principles:

  • ๐Ÿ“š Raw sources โ€” Your immutable source documents (PDFs, URLs, YouTube videos), all collected into raw/
  • ๐Ÿ“ The wiki โ€” LLM-maintained markdown pages with cross-references
  • โš™๏ธ The schema โ€” wiki.md that tells the LLM how to maintain the wiki

โœจ Features

๐Ÿ” Full-Text Search

  • SQLite FTS5 with Porter stemmer and BM25 ranking
  • Ranked results with highlighted snippets
  • LIKE fallback for FTS5 syntax errors
  • 0.06 seconds for 157 pages

๐Ÿ”— Bidirectional Reference Tracking

  • Automatic [[wikilink]] detection and parsing
  • Section-level granularity ([[Page#section|display]])
  • Inbound/outbound link queries
  • JSON export for Obsidian compatibility

๐Ÿง  Smart Recommendations

  • Missing page detection (frequently referenced but don't exist)
  • Orphan page identification (with intelligent exclusion)
  • Cross-reference opportunities
  • Smart hints for wiki improvement

๐Ÿ”€ Query Knowledge Compounding (v0.12.6+)

  • wiki_synthesize โ€” Save query answers as persistent wiki pages
  • Auto-generated Query: {Topic} pages with structured Sources sections
  • Smart duplicate detection with date suffix for multiple runs
  • merge_or_replace parameter: sink, merge, or replace strategies
  • Auto-links to wiki pages and raw sources
  • Auto-logs to log.md with parseable format
  • Answers compound in the knowledge base just like ingested sources

๐Ÿ”„ Query Sink (v0.13.0+)

  • Compound answers without creating duplicate pages
  • Pending entries saved to sink/ for later review
  • Urgency tracking: ok / attention (7d+) / aging (14d+) / stale (30d+)
  • Smart suggestions: content gaps, source quality, query patterns
  • Dedup detection flags entries with >70% text similarity

๐Ÿ“ฅ Enhanced Ingest (v0.15.0+)

  • Rich metadata: file_type, file_size, word_count, has_images, content_preview
  • No summary returned โ€” respects "LLM reads source" principle
  • Auto-collects all sources into raw/ directory

๐Ÿงน Smart Lint with Clues (v0.15.0+)

  • dated_claim (critical): Pages referencing years โ‰ฅ3 years older than latest raw source
  • topic_overlap (informational): Query pages with โ‰ฅ85% keyword overlap
  • missing_cross_ref (informational): Concepts mentioned but not wikilinked
  • Two-tier hints: critical (max 3) + informational (max 5)

๐Ÿš€ Performance Optimized

  • Batch inserts with executemany()
  • PRAGMA optimizations (MEMORY journal, OFF synchronous)
  • Progress reporting for large collections
  • ON CONFLICT preserves created_at on page updates
  • 10-20x faster than naive implementation

๐Ÿ”ง Pure Tool Design

  • Zero domain assumptions โ€” No hardcoded concepts
  • Configuration-driven โ€” You decide what to exclude via .wiki-config.yaml
  • Universal patterns โ€” Date formats, frontmatter markers, directory structures

๐Ÿ“ฆ Zero Core Dependencies

  • Standard library only
  • Optional dependencies for extended functionality:
    • pymupdf โ€” PDF extraction
    • trafilatura โ€” Web scraping
    • youtube-transcript-api โ€” YouTube transcripts
    • mcp โ€” MCP server support
    • pyyaml โ€” Configuration loading

๐Ÿ“ฆ Installation

Basic Installation (Zero Dependencies)

pip install llmwikify

Full Installation (All Features)

pip install llmwikify[all]

Development Installation

git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"

๐Ÿš€ Quick Start

1. Initialize a Wiki

llmwikify init

Output:

Wiki initialized at /path/to/wiki
  raw/     โ†’ drop source files here (all sources collected here)
  wiki/    โ†’ LLM-maintained wiki pages
  .llmwikify.db โ†’ SQLite index
  wiki.md  โ†’ conventions and workflows for the LLM
  .wiki-config.yaml.example โ†’ configuration template

2. Ingest Sources

# Ingest a PDF (copied to raw/)
llmwikify ingest document.pdf

# Ingest a URL (text saved to raw/)
llmwikify ingest https://example.com/article

# Ingest a YouTube video (transcript saved to raw/)
llmwikify ingest https://youtube.com/watch?v=abc123

All sources are automatically collected into raw/ for centralized management.

3. Build Index

llmwikify build-index

Output:

  Processing: 100/157 (63.7%) - 29591.5 files/sec

Total pages: 157
Total links: 636
Elapsed: 0.06s

4. Search and Query

# Full-text search
llmwikify search "gold mining" -l 10

# Query page references
llmwikify references "Company Name"

# Get smart recommendations
llmwikify recommend

# Get wiki health hints
llmwikify hint

๐Ÿ’ป Python API

from llmwikify import Wiki, create_wiki
from pathlib import Path

# Create/open a wiki
wiki = create_wiki("/path/to/wiki")

# Initialize
wiki.init()

# Ingest source (returns data for LLM processing)
result = wiki.ingest_source("document.pdf")
print(f"Source: {result['title']}, saved to: {result['source_raw_path']}")

# Write page (auto-updates index.md)
wiki.write_page("Test Page", "# Test\n\nContent with [[Link]]")

# Search
results = wiki.search("gold mining", limit=10)
for r in results:
    print(f"{r['page_name']}: {r['snippet']}")

# Synthesize query answer (compounds knowledge)
wiki.synthesize_query(
    query="Compare gold and copper mining",
    answer="# Mining Comparison\n\n...",
    source_pages=["Gold Mining", "Copper Mining"],
    raw_sources=["raw/report.pdf"],
)
# โ†’ Creates "Query: Compare Gold And Copper Mining" page
# โ†’ Auto-adds Sources section with wikilinks and raw links
# โ†’ Auto-logs to log.md

# Get references
inbound = wiki.get_inbound_links("Company Page")
outbound = wiki.get_outbound_links("Company Page")

# Get recommendations
recs = wiki.recommend()
print(f"Missing pages: {recs['missing_pages']}")
print(f"Orphan pages: {recs['orphan_pages']}")

# Get smart hints
hints = wiki.hint()
print(f"Total hints: {hints['summary']['total_hints']}")

๐Ÿ—„๏ธ MCP Server (13 Tools)

The MCP server exposes wiki operations as tools for LLMs.

Tool Description
wiki_init Initialize wiki directory structure
wiki_ingest Ingest a source file (auto-collects to raw/)
wiki_write_page Write/update a wiki page
wiki_read_page Read a wiki page
wiki_search Full-text search with snippets
wiki_lint Health check (broken links, orphan pages)
wiki_status Get wiki status overview
wiki_log Append entry to wiki log
wiki_recommend Get recommendations (missing pages, orphans)
wiki_build_index Build reference index from all pages
wiki_read_schema Read wiki.md (schema/conventions)
wiki_update_schema Update wiki.md with new conventions
wiki_synthesize Save query answer as wiki page (knowledge compounding)

Quick Start

from llmwikify import Wiki, MCPServer

wiki = Wiki("/path/to/wiki")
server = MCPServer(wiki)  # Auto-reads config from wiki.config["mcp"]
server.serve()            # STDIO transport (default)

See MCP Setup Guide for transport options and configuration.


โš™๏ธ Configuration

.wiki-config.yaml

# Orphan detection exclusions
orphan_detection:
  exclude_patterns:
    - '^\d{4}-\d{2}-\d{2}$'  # Date format (2025-07-31)
    - '^meeting-.*'          # Meeting notes

  exclude_frontmatter:
    - 'redirect_to'          # Redirect pages

  archive_directories:
    - 'archive'
    - 'logs'

# MCP server settings
mcp:
  host: "127.0.0.1"
  port: 8765
  transport: "stdio"  # or "http" or "sse"

Design Principle: Zero Domain Assumptions

llmwikify does NOT assume:

  • โŒ "Daily summary" concept
  • โŒ "Company page" concept
  • โŒ Any domain-specific page types

llmwikify provides:

  • โœ… Universal patterns (dates, quarters)
  • โœ… Frontmatter markers (redirect_to)
  • โœ… Directory structures (archive/, logs/)
  • โœ… User-configurable rules

This makes llmwikify truly general-purpose:

  • Mining News Wiki: Dates = daily summaries
  • Personal KB: Dates = journal entries
  • Project Docs: Dates = release notes
  • Research Wiki: Dates = experiment logs

๐Ÿ“Š CLI Commands (15 Total)

Command Description Example
init Initialize wiki llmwikify init
ingest Ingest PDF/URL/YouTube llmwikify ingest doc.pdf
write_page Create/update page llmwikify write_page Test -c "..."
read_page Read page llmwikify read_page Test
search Full-text search llmwikify search "gold" -l 10
lint Health check llmwikify lint
status Status overview llmwikify status
log Record log entry llmwikify log ingest doc.pdf
references Show references llmwikify references "Agnico"
build-index Build reference index llmwikify build-index
export-index Export JSON llmwikify export-index -o out.json
batch Batch ingest llmwikify batch raw/pdfs/ -l 10
hint Smart suggestions llmwikify hint
recommend Recommendations llmwikify recommend
serve Start MCP server llmwikify serve

๐Ÿ—„๏ธ Database Schema

-- FTS5 full-text search
CREATE VIRTUAL TABLE pages_fts USING fts5(
    page_name, content,
    tokenize='porter unicode61'
);

-- Bidirectional link tracking
CREATE TABLE page_links (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_page TEXT NOT NULL,
    target_page TEXT NOT NULL,
    section TEXT,
    display_text TEXT,
    file_path TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Page metadata
CREATE TABLE pages (
    page_name TEXT PRIMARY KEY,
    file_path TEXT NOT NULL,
    content_length INTEGER,
    word_count INTEGER,
    link_count INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src/llmwikify

# Run specific module
pytest tests/test_query_flow.py -v

# Run and generate HTML report
pytest --cov=src/llmwikify --cov-report=html

Test Coverage: 110 tests, all passing

Test Files

File Tests Coverage
test_wiki_core.py 36 Wiki class (init, ingest, pages, schema, lint)
test_query_flow.py 27 Query synthesis (basic, sources, logging, duplicates, full flow)
test_index.py 8 WikiIndex (FTS5, links, export)
test_recommend.py 5 Recommendation engine
test_cli.py 8 CLI commands
test_extractors.py 12 Content extractors
test_llm_client.py 14 LLM client config and JSON parsing

๐Ÿ“š Use Cases

1. Mining News Wiki

orphan_detection:
  exclude_patterns:
    - '^\d{4}-\d{2}-\d{2}$'  # Daily summaries
    - '^weekly-.*'           # Weekly insights
  archive_directories:
    - 'daily'
    - 'analysis'

Results: 89 โ†’ 2 orphan pages (97.8% false positive elimination)

2. Personal Knowledge Base

orphan_detection:
  exclude_patterns:
    - '^book-note-.*'
    - '^course-.*'
  archive_directories:
    - 'journal'
    - 'notes'

3. Project Documentation

orphan_detection:
  exclude_patterns:
    - '^release-.*'
    - '^meeting-.*'
    - '^rfc-.*'
  archive_directories:
    - 'releases'
    - 'meetings'

4. Research Wiki

orphan_detection:
  exclude_patterns:
    - '^experiment-.*'
    - '^paper-note-.*'
  archive_directories:
    - 'experiments'
    - 'papers'

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     llmwikify Architecture                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. Core Layer                                              โ”‚
โ”‚     Wiki (wiki.py) โ€” Business logic, synthesize_query       โ”‚
โ”‚     WikiIndex (index.py) โ€” FTS5 + Reference Tracking        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ–ฒ
                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Extraction Layer                        โ”‚
โ”‚  text.py โ”‚ pdf.py โ”‚ web.py โ”‚ youtube.py                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ–ฒ
                               โ”‚
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ–ผ                               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  CLI (15 commands)     โ”‚        โ”‚  MCP Server (13 tools) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“– Documentation


๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. Report bugs โ€” GitHub Issues
  2. Fix bugs โ€” Submit a PR
  3. Add features โ€” Open an issue first to discuss
  4. Improve docs โ€” PRs welcome
  5. Share use cases โ€” Add your .wiki-config.yaml to examples/

Development Setup

git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/llmwikify
ruff check src/llmwikify

# Type check
mypy src/llmwikify

๐Ÿ“ˆ Roadmap

โœ… v0.15.0 (Released)

  • Enhanced ingest metadata (file_type, file_size, word_count, has_images, content_preview)
  • Clue-based lint detection: dated_claim, topic_overlap, missing_cross_ref
  • Two-tier hint system: critical + informational (max 8 total)
  • Query sink enhancement: merge_or_replace, suggestions, dedup, urgency tracking
  • 199 tests passing

โœ… v0.12.0 - v0.14.0 (Completed)

  • โœ… Complete CLI commands (15 total)
  • โœ… Auto-index on page write
  • โœ… Raw source collection (all sources into raw/)
  • โœ… wiki_synthesize โ€” Query knowledge compounding cycle
  • โœ… Query sink feature with bidirectional linking
  • โœ… Smart recommendations and hints

v1.0.0 (Roadmap)

  • Web UI (optional)
  • Graph visualization (graphviz/Mermaid)
  • MCP server authentication
  • More extractors (Word, Excel)
  • Incremental index updates
  • Stable API guarantee
  • Production hardening

๐Ÿ™ Acknowledgments

  • llm-wiki-kit โ€” Original inspiration and foundational design by Sashank. This project extends the core concepts of LLM-maintained wikis with enhanced CLI tools, MCP server support, query knowledge compounding, and configuration-driven flexibility.
  • Andrej Karpathy โ€” LLM Wiki Principles
  • Obsidian โ€” Markdown wiki platform
  • MCP (Model Context Protocol) โ€” LLM integration standard

๐Ÿ“„ License

MIT License โ€” See LICENSE file for details.


๐Ÿ“ฌ Contact


Built with โค๏ธ based on Karpathy's LLM Wiki Principles

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmwikify-0.15.0.tar.gz (96.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmwikify-0.15.0-py3-none-any.whl (51.4 kB view details)

Uploaded Python 3

File details

Details for the file llmwikify-0.15.0.tar.gz.

File metadata

  • Download URL: llmwikify-0.15.0.tar.gz
  • Upload date:
  • Size: 96.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for llmwikify-0.15.0.tar.gz
Algorithm Hash digest
SHA256 6fccddc30ce0f9db43b20844c2a575e6350486a30a6b68432cb07954ed3d253f
MD5 33f6ce61d15139c09371b25e4b1deae0
BLAKE2b-256 9761161044e78a082be505a1411052ea1db2dda4ed39dd14a5b31fc5382125b4

See more details on using hashes here.

File details

Details for the file llmwikify-0.15.0-py3-none-any.whl.

File metadata

  • Download URL: llmwikify-0.15.0-py3-none-any.whl
  • Upload date:
  • Size: 51.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for llmwikify-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94c9fc51e11a6fa269fa967066aaf2ee6fb54fc1e222f234e28da7dda17470c2
MD5 4485963389d44658f7a3b05425026887
BLAKE2b-256 c9acd4097ca541921f68f595e4518e362b22e8175e1e6b3c41c35edf438cee18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page