General-purpose LLM-Wiki CLI and Python library - Build persistent, LLM-maintained knowledge bases
Project description
llmwikify
Build persistent, LLM-maintained knowledge bases โ Based on Karpathy's LLM Wiki Principles
๐ฏ What is llmwikify?
llmwikify is a general-purpose LLM-Wiki management tool that helps you build and maintain a persistent knowledge base using LLMs. Unlike RAG systems that rediscover knowledge from scratch on every query, llmwikify incrementally builds and maintains a structured, interlinked wiki that compounds over time.
Core Philosophy
The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.
Based on Karpathy's LLM Wiki Principles:
- ๐ Raw sources โ Your immutable source documents (PDFs, URLs, YouTube videos), all collected into
raw/ - ๐ The wiki โ LLM-maintained markdown pages with cross-references
- โ๏ธ The schema โ
wiki.mdthat tells the LLM how to maintain the wiki
โจ Features
๐ Full-Text Search
- SQLite FTS5 with Porter stemmer and BM25 ranking
- Ranked results with highlighted snippets
- LIKE fallback for FTS5 syntax errors
- 0.06 seconds for 157 pages
๐ Bidirectional Reference Tracking
- Automatic
[[wikilink]]detection and parsing - Section-level granularity (
[[Page#section|display]]) - Inbound/outbound link queries
- JSON export for Obsidian compatibility
๐ง Smart Recommendations
- Missing page detection (frequently referenced but don't exist)
- Orphan page identification (with intelligent exclusion)
- Cross-reference opportunities
- Smart hints for wiki improvement
๐ Query Knowledge Compounding (v0.12.6+)
- wiki_synthesize โ Save query answers as persistent wiki pages
- Auto-generated
Query: {Topic}pages with structured Sources sections - Smart duplicate detection with date suffix for multiple runs
merge_or_replaceparameter: sink, merge, or replace strategies- Auto-links to wiki pages and raw sources
- Auto-logs to
log.mdwith parseable format - Answers compound in the knowledge base just like ingested sources
๐ Query Sink (v0.13.0+)
- Compound answers without creating duplicate pages
- Pending entries saved to sink/ for later review
- Urgency tracking: ok / attention (7d+) / aging (14d+) / stale (30d+)
- Smart suggestions: content gaps, source quality, query patterns
- Dedup detection flags entries with >70% text similarity
๐ฅ Enhanced Ingest (v0.15.0+)
- Rich metadata: file_type, file_size, word_count, has_images, content_preview
- No summary returned โ respects "LLM reads source" principle
- Auto-collects all sources into raw/ directory
๐งน Smart Lint with Clues (v0.15.0+)
dated_claim(critical): Pages referencing years โฅ3 years older than latest raw sourcetopic_overlap(informational): Query pages with โฅ85% keyword overlapmissing_cross_ref(informational): Concepts mentioned but not wikilinked- Two-tier hints: critical (max 3) + informational (max 5)
๐ Performance Optimized
- Batch inserts with
executemany() - PRAGMA optimizations (MEMORY journal, OFF synchronous)
- Progress reporting for large collections
- ON CONFLICT preserves
created_aton page updates - 10-20x faster than naive implementation
๐ง Pure Tool Design
- Zero domain assumptions โ No hardcoded concepts
- Configuration-driven โ You decide what to exclude via
.wiki-config.yaml - Universal patterns โ Date formats, frontmatter markers, directory structures
๐ฆ Zero Core Dependencies
- Standard library only
- Optional dependencies for extended functionality:
pymupdfโ PDF extractiontrafilaturaโ Web scrapingyoutube-transcript-apiโ YouTube transcriptsmcpโ MCP server supportpyyamlโ Configuration loading
๐ฆ Installation
Basic Installation (Zero Dependencies)
pip install llmwikify
Full Installation (All Features)
pip install llmwikify[all]
Development Installation
git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"
๐ Quick Start
1. Initialize a Wiki
llmwikify init
Output:
Wiki initialized at /path/to/wiki
raw/ โ drop source files here (all sources collected here)
wiki/ โ LLM-maintained wiki pages
.llmwikify.db โ SQLite index
wiki.md โ conventions and workflows for the LLM
.wiki-config.yaml.example โ configuration template
2. Ingest Sources
# Ingest a PDF (copied to raw/)
llmwikify ingest document.pdf
# Ingest a URL (text saved to raw/)
llmwikify ingest https://example.com/article
# Ingest a YouTube video (transcript saved to raw/)
llmwikify ingest https://youtube.com/watch?v=abc123
All sources are automatically collected into raw/ for centralized management.
3. Build Index
llmwikify build-index
Output:
Processing: 100/157 (63.7%) - 29591.5 files/sec
Total pages: 157
Total links: 636
Elapsed: 0.06s
4. Search and Query
# Full-text search
llmwikify search "gold mining" -l 10
# Query page references
llmwikify references "Company Name"
# Get smart recommendations
llmwikify recommend
# Get wiki health hints
llmwikify hint
๐ป Python API
from llmwikify import Wiki, create_wiki
from pathlib import Path
# Create/open a wiki
wiki = create_wiki("/path/to/wiki")
# Initialize
wiki.init()
# Ingest source (returns data for LLM processing)
result = wiki.ingest_source("document.pdf")
print(f"Source: {result['title']}, saved to: {result['source_raw_path']}")
# Write page (auto-updates index.md)
wiki.write_page("Test Page", "# Test\n\nContent with [[Link]]")
# Search
results = wiki.search("gold mining", limit=10)
for r in results:
print(f"{r['page_name']}: {r['snippet']}")
# Synthesize query answer (compounds knowledge)
wiki.synthesize_query(
query="Compare gold and copper mining",
answer="# Mining Comparison\n\n...",
source_pages=["Gold Mining", "Copper Mining"],
raw_sources=["raw/report.pdf"],
)
# โ Creates "Query: Compare Gold And Copper Mining" page
# โ Auto-adds Sources section with wikilinks and raw links
# โ Auto-logs to log.md
# Get references
inbound = wiki.get_inbound_links("Company Page")
outbound = wiki.get_outbound_links("Company Page")
# Get recommendations
recs = wiki.recommend()
print(f"Missing pages: {recs['missing_pages']}")
print(f"Orphan pages: {recs['orphan_pages']}")
# Get smart hints
hints = wiki.hint()
print(f"Total hints: {hints['summary']['total_hints']}")
๐๏ธ MCP Server (13 Tools)
The MCP server exposes wiki operations as tools for LLMs.
| Tool | Description |
|---|---|
wiki_init |
Initialize wiki directory structure |
wiki_ingest |
Ingest a source file (auto-collects to raw/) |
wiki_write_page |
Write/update a wiki page |
wiki_read_page |
Read a wiki page |
wiki_search |
Full-text search with snippets |
wiki_lint |
Health check (broken links, orphan pages) |
wiki_status |
Get wiki status overview |
wiki_log |
Append entry to wiki log |
wiki_recommend |
Get recommendations (missing pages, orphans) |
wiki_build_index |
Build reference index from all pages |
wiki_read_schema |
Read wiki.md (schema/conventions) |
wiki_update_schema |
Update wiki.md with new conventions |
wiki_synthesize |
Save query answer as wiki page (knowledge compounding) |
Quick Start
from llmwikify import Wiki, MCPServer
wiki = Wiki("/path/to/wiki")
server = MCPServer(wiki) # Auto-reads config from wiki.config["mcp"]
server.serve() # STDIO transport (default)
See MCP Setup Guide for transport options and configuration.
โ๏ธ Configuration
.wiki-config.yaml
# Orphan detection exclusions
orphan_detection:
exclude_patterns:
- '^\d{4}-\d{2}-\d{2}$' # Date format (2025-07-31)
- '^meeting-.*' # Meeting notes
exclude_frontmatter:
- 'redirect_to' # Redirect pages
archive_directories:
- 'archive'
- 'logs'
# MCP server settings
mcp:
host: "127.0.0.1"
port: 8765
transport: "stdio" # or "http" or "sse"
Design Principle: Zero Domain Assumptions
llmwikify does NOT assume:
- โ "Daily summary" concept
- โ "Company page" concept
- โ Any domain-specific page types
llmwikify provides:
- โ Universal patterns (dates, quarters)
- โ Frontmatter markers (redirect_to)
- โ Directory structures (archive/, logs/)
- โ User-configurable rules
This makes llmwikify truly general-purpose:
- Mining News Wiki: Dates = daily summaries
- Personal KB: Dates = journal entries
- Project Docs: Dates = release notes
- Research Wiki: Dates = experiment logs
๐ CLI Commands (15 Total)
| Command | Description | Example |
|---|---|---|
init |
Initialize wiki | llmwikify init |
ingest |
Ingest PDF/URL/YouTube | llmwikify ingest doc.pdf |
write_page |
Create/update page | llmwikify write_page Test -c "..." |
read_page |
Read page | llmwikify read_page Test |
search |
Full-text search | llmwikify search "gold" -l 10 |
lint |
Health check | llmwikify lint |
status |
Status overview | llmwikify status |
log |
Record log entry | llmwikify log ingest doc.pdf |
references |
Show references | llmwikify references "Agnico" |
build-index |
Build reference index | llmwikify build-index |
export-index |
Export JSON | llmwikify export-index -o out.json |
batch |
Batch ingest | llmwikify batch raw/pdfs/ -l 10 |
hint |
Smart suggestions | llmwikify hint |
recommend |
Recommendations | llmwikify recommend |
serve |
Start MCP server | llmwikify serve |
๐๏ธ Database Schema
-- FTS5 full-text search
CREATE VIRTUAL TABLE pages_fts USING fts5(
page_name, content,
tokenize='porter unicode61'
);
-- Bidirectional link tracking
CREATE TABLE page_links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_page TEXT NOT NULL,
target_page TEXT NOT NULL,
section TEXT,
display_text TEXT,
file_path TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Page metadata
CREATE TABLE pages (
page_name TEXT PRIMARY KEY,
file_path TEXT NOT NULL,
content_length INTEGER,
word_count INTEGER,
link_count INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=src/llmwikify
# Run specific module
pytest tests/test_query_flow.py -v
# Run and generate HTML report
pytest --cov=src/llmwikify --cov-report=html
Test Coverage: 110 tests, all passing
Test Files
| File | Tests | Coverage |
|---|---|---|
test_wiki_core.py |
36 | Wiki class (init, ingest, pages, schema, lint) |
test_query_flow.py |
27 | Query synthesis (basic, sources, logging, duplicates, full flow) |
test_index.py |
8 | WikiIndex (FTS5, links, export) |
test_recommend.py |
5 | Recommendation engine |
test_cli.py |
8 | CLI commands |
test_extractors.py |
12 | Content extractors |
test_llm_client.py |
14 | LLM client config and JSON parsing |
๐ Use Cases
1. Mining News Wiki
orphan_detection:
exclude_patterns:
- '^\d{4}-\d{2}-\d{2}$' # Daily summaries
- '^weekly-.*' # Weekly insights
archive_directories:
- 'daily'
- 'analysis'
Results: 89 โ 2 orphan pages (97.8% false positive elimination)
2. Personal Knowledge Base
orphan_detection:
exclude_patterns:
- '^book-note-.*'
- '^course-.*'
archive_directories:
- 'journal'
- 'notes'
3. Project Documentation
orphan_detection:
exclude_patterns:
- '^release-.*'
- '^meeting-.*'
- '^rfc-.*'
archive_directories:
- 'releases'
- 'meetings'
4. Research Wiki
orphan_detection:
exclude_patterns:
- '^experiment-.*'
- '^paper-note-.*'
archive_directories:
- 'experiments'
- 'papers'
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llmwikify Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Core Layer โ
โ Wiki (wiki.py) โ Business logic, synthesize_query โ
โ WikiIndex (index.py) โ FTS5 + Reference Tracking โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ
โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Extraction Layer โ
โ text.py โ pdf.py โ web.py โ youtube.py โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ
โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLI (15 commands) โ โ MCP Server (13 tools) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Documentation
- Configuration Guide โ Detailed configuration options
- Chinese Config Guide โ ไธญๆ้ ็ฝฎๆๅ
- LLM Wiki Principles โ Karpathy's original vision
- Reference Tracking Guide โ How references work
- MCP Setup Guide โ MCP server configuration
- Migration Guide โ Version migration notes
- Architecture โ Technical architecture
๐ค Contributing
Contributions are welcome! Here's how you can help:
- Report bugs โ GitHub Issues
- Fix bugs โ Submit a PR
- Add features โ Open an issue first to discuss
- Improve docs โ PRs welcome
- Share use cases โ Add your
.wiki-config.yamlto examples/
Development Setup
git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/llmwikify
ruff check src/llmwikify
# Type check
mypy src/llmwikify
๐ Roadmap
โ v0.15.0 (Released)
- Enhanced ingest metadata (file_type, file_size, word_count, has_images, content_preview)
- Clue-based lint detection: dated_claim, topic_overlap, missing_cross_ref
- Two-tier hint system: critical + informational (max 8 total)
- Query sink enhancement: merge_or_replace, suggestions, dedup, urgency tracking
- 199 tests passing
โ v0.12.0 - v0.14.0 (Completed)
- โ Complete CLI commands (15 total)
- โ Auto-index on page write
- โ Raw source collection (all sources into raw/)
- โ wiki_synthesize โ Query knowledge compounding cycle
- โ Query sink feature with bidirectional linking
- โ Smart recommendations and hints
v1.0.0 (Roadmap)
- Web UI (optional)
- Graph visualization (graphviz/Mermaid)
- MCP server authentication
- More extractors (Word, Excel)
- Incremental index updates
- Stable API guarantee
- Production hardening
๐ Acknowledgments
- llm-wiki-kit โ Original inspiration and foundational design by Sashank. This project extends the core concepts of LLM-maintained wikis with enhanced CLI tools, MCP server support, query knowledge compounding, and configuration-driven flexibility.
- Andrej Karpathy โ LLM Wiki Principles
- Obsidian โ Markdown wiki platform
- MCP (Model Context Protocol) โ LLM integration standard
๐ License
MIT License โ See LICENSE file for details.
๐ฌ Contact
- GitHub: @sn0wfree
- Email: linlu1234567@sina.com
- Discussions: GitHub Discussions
Built with โค๏ธ based on Karpathy's LLM Wiki Principles
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmwikify-0.15.0.tar.gz.
File metadata
- Download URL: llmwikify-0.15.0.tar.gz
- Upload date:
- Size: 96.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fccddc30ce0f9db43b20844c2a575e6350486a30a6b68432cb07954ed3d253f
|
|
| MD5 |
33f6ce61d15139c09371b25e4b1deae0
|
|
| BLAKE2b-256 |
9761161044e78a082be505a1411052ea1db2dda4ed39dd14a5b31fc5382125b4
|
File details
Details for the file llmwikify-0.15.0-py3-none-any.whl.
File metadata
- Download URL: llmwikify-0.15.0-py3-none-any.whl
- Upload date:
- Size: 51.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94c9fc51e11a6fa269fa967066aaf2ee6fb54fc1e222f234e28da7dda17470c2
|
|
| MD5 |
4485963389d44658f7a3b05425026887
|
|
| BLAKE2b-256 |
c9acd4097ca541921f68f595e4518e362b22e8175e1e6b3c41c35edf438cee18
|