Toolkit for building agent-maintained Obsidian-style wikis — link management, linting, document conversion, and agent coordination
Project description
agent-wiki
Toolkit for building LLM-maintained wikis. Handles the plumbing — link management, linting, file operations, document conversion, and agent coordination — so LLMs focus on content.
Based on the LLM Wiki pattern by Andrej Karpathy: instead of RAG (re-deriving knowledge on every query), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files. The wiki is a compounding artifact: cross-references are already there, contradictions already flagged, synthesis already current. You curate sources and ask questions; the LLM does the bookkeeping.
Install
pip install agent-wiki
Quick Start
from agent_wiki import WikiRoot
wiki = WikiRoot("/path/to/wiki")
# Health check — find broken links, missing backlinks, orphan pages
issues = wiki.lint()
# Move a page and auto-update every [[link]] across the wiki
wiki.move("topics/Old Name.md", "topics/New Name.md")
# Convert a PDF to structured markdown with images
wiki.convert_pdf("paper.pdf", "processed/paper.md", max_dpi=150)
# Find all pages linking to a topic
wiki.backlinks("Sand Injectites")
# Search for a concept across all pages
wiki.find_references("polygonal fault")
# Wiki statistics
wiki.stats()
CLI
Every operation is also available from the command line — designed for AI agents calling via Bash.
# Lint
agent-wiki --root wiki/ lint
agent-wiki --root wiki/ lint --json # structured output for agents
# File operations (auto-updates all links)
agent-wiki --root wiki/ move old.md new.md
agent-wiki --root wiki/ merge source.md target.md
agent-wiki --root wiki/ rename "Old Name" "New Name"
# Document conversion
agent-wiki --root wiki/ convert pdf paper.pdf processed/paper.md --max-dpi 150
# Search
agent-wiki --root wiki/ backlinks "Page Name"
agent-wiki --root wiki/ find-references "some concept"
agent-wiki --root wiki/ stats
Use --json on any command for machine-readable output.
Features
Link Management
Obsidian-style [[wiki-links]] with full support for [[target|display text]]. The library builds a link graph, resolves links by filename stem (Obsidian's shortest-unique-path matching), and rewrites links automatically when files move.
from agent_wiki.links import parse_links, find_backlinks, rewrite_links
links = parse_links("See [[Topic A]] and [[Topic B|related topic]]")
# → [WikiLink(target="Topic A", ...), WikiLink(target="Topic B", display="related topic", ...)]
rewrite_links(text, "Old Name", "New Name")
# → updates [[Old Name]] → [[New Name]], preserves [[Old Name|display]] → [[New Name|display]]
Linting
Automated wiki health checks that catch real problems:
| Check | Severity | What it detects |
|---|---|---|
| Broken links | error | [[Target]] where no file matches |
| Broken images | error |  where the image file doesn't exist |
| Broken anchors | warning | [[Page#Section]] where the heading doesn't exist in the target |
| Broken URLs | warning | Malformed external URLs; optionally validates via HTTP (--check-urls) |
| Parent chain | error | Topic page with missing or broken parent: frontmatter chain |
| Missing frontmatter | error | Required YAML fields missing per page type (Quartz-compatible) |
| Missing backlinks | warning | Topic lists source but source doesn't link back |
| Orphan pages | warning | Pages with zero inbound links |
| Missing sections | warning | Topic without ## Sources, source without ## Topics |
| Dispute chronology | warning | Disputed claims with dates out of order (supports callout syntax) |
| Split candidates | info | Pages exceeding 500 lines |
| Structure mismatch | warning | Folder missing its hub page |
issues = wiki.lint()
for issue in issues:
print(f"[{issue.severity.value}] {issue.file}: {issue.message}")
File Operations
Move, rename, or merge pages — all [[wiki-links]] across the entire wiki are updated automatically.
# Rename a page — finds it by name, renames file, updates all references
wiki.rename("Old Topic Name", "New Topic Name")
# Merge two pages — appends content, redirects all links, deletes source
wiki.merge("sources/duplicate.md", "sources/canonical.md")
Document Conversion
PDF to structured markdown using pymupdf4llm — proper headings, paragraphs, tables, and extracted images. Not flat text.
wiki.convert_pdf(
"paper.pdf",
"processed/paper.md",
max_dpi=150, # image resolution cap
extract_images=True, # images saved to img/ subfolder
)
Stubs for .docx, .pptx, .xlsx conversion are included for future implementation.
Kanban Agent Pipeline
A filesystem-based kanban system for coordinating multiple AI agents. No database — the folder structure is the state. Agents communicate through task cards that move between columns.
Columns
kanban/
├── backlog/ # cards waiting for an agent to claim
├── processing/ # claimed by an agent, work in progress
├── review/ # work done, waiting for the next stage
└── done/ # completed and archived
Task Cards
Each card is a lightweight markdown file with YAML frontmatter:
---
agent: reader|writer|reviewer # which agent type should pick this up
model: sonnet|opus # which model to use (default: sonnet)
status: pending|claimed
action: rewrite|merge|move|delete|rename|create # for structural fix cards
source_file: "raw/paper.pdf"
processed_file: "processed/paper.md"
summary_file: "" # filled by reader → path to source page
topic_files: [] # filled by writer → list of topic pages
created: 2024-01-01T00:00:00+00:00
claimed_at: ""
---
## Actions
Specific instructions for the agent (free-form markdown body).
Cards flow through the pipeline by physically moving between folders. An agent "claims" a card by moving it from backlog/ to processing/ (atomic filesystem operation — prevents race conditions). When done, it moves the card to review/ with the next agent type set.
The Four-Agent Pipeline
raw/paper.pdf
→ kanban_process: converts PDF to markdown, creates card (agent: reader)
→ Reader (Sonnet): reads processed markdown, writes source page
→ card moves to review (agent: writer)
→ Writer (Sonnet): reads source page, writes/updates topic pages
→ card moves to review (agent: reviewer)
→ Reviewer (Opus): checks quality, tree placement, citations
├── approve → done/
├── send back → backlog/ (agent: writer) with rewrite instructions
└── escalate → backlog/ (agent: writer, model: opus) for complex rewrites
After all per-article reviews, a final structure audit (Opus reviewer) examines the full wiki tree and creates fix cards:
Structure audit
→ Creates cards: merge duplicates, move misplaced pages, create missing articles
→ Writer agents execute the fixes using agent-wiki CLI (move/merge/rename)
Orchestrator Role
The orchestrator (typically an Opus-level agent or the main Claude Code session) coordinates the pipeline but never writes or reviews content itself:
- Runs
kanban_processto convert new PDFs - Spawns reader agents (max 5 parallel)
- Spawns writer agents (respects
model:field on cards) - Spawns reviewer agents (always Opus)
- Spawns structure audit reviewer
- Executes fix cards
- Updates
index.md,log.md, runs lint
Python API
from agent_wiki.kanban import create_card, claim, complete, list_cards
# Batch: scan for new files, convert, create task cards — one call
cards = wiki.kanban_process(input_dir="raw/")
# Agent claims work (atomic move)
card = claim("kanban/backlog/paper.md", "kanban/processing/")
# Agent finishes — move to next stage
complete("kanban/processing/paper.md", "kanban/review/", agent="writer")
# List cards filtered by column and agent type
cards = list_cards("kanban/", column="backlog", agent="reader")
# Recover cards stuck in processing (agent crashed)
from agent_wiki.kanban import recover_stale
recover_stale("kanban/processing/", "kanban/backlog/", max_age_minutes=30)
Project Layout
agent-wiki init creates the full project structure. The wiki vault (wiki/) is an Obsidian vault that can be published directly with Quartz.
my-wiki-project/
├── raw/ # source PDFs (immutable — never modified by agents)
│ └── completed/ # originals moved here after processing
├── wiki/ # ← the Obsidian vault / Quartz site (WikiRoot)
│ ├── index.md # root page — links all top-level topics
│ ├── log.md # chronological activity log
│ ├── processed/ # auto-generated markdown from PDFs (with images)
│ │ └── img/ # extracted figures, organized by paper
│ ├── sources/ # one page per ingested paper
│ │ └── Topic Area/ # grouped by top-level topic
│ └── topics/ # synthesized knowledge pages (the core)
│ └── Topic Area/ # folder = breadcrumb level in Quartz
│ ├── Topic Area.md # hub page (shares folder name)
│ └── Sub Topic.md # child topic
├── kanban/ # task cards for agent coordination
│ ├── backlog/ # waiting for an agent
│ ├── processing/ # claimed, work in progress
│ ├── review/ # done, waiting for next stage
│ └── done/ # archived
├── instructions/
│ ├── agents/ # agent prompts
│ │ ├── reader.md # extracts findings from processed papers
│ │ ├── writer.md # synthesizes topic pages from source pages
│ │ ├── orchestrator.md # coordinates the pipeline
│ │ └── reviewer.md # audits quality and wiki structure
│ ├── kanban-design.md # pipeline spec
│ └── lint.md # audit checklist
├── docs/
│ └── publishing.md # Quartz publishing guide
├── .claude/commands/ # slash commands for Claude Code
│ ├── ingest.md # /ingest — full pipeline
│ ├── lint.md # /lint — health check
│ └── review.md # /review — structure audit
├── CLAUDE.md # project instructions for Claude
└── SETUP.md # first-run setup guide
# Initialize a new project
agent-wiki init my-project --name "My Research Wiki"
Legal mode
For adversarial-litigation case files (Brazilian PROJUDI PDFs and similar multi-document court files), use --mode legal:
agent-wiki init smith-v-jones \
--mode legal \
--party "Smith" --party "Jones" \
--case "1:23-cv-00001"
This adds a critic agent to the kanban pipeline (anti-framing review, admissions against interest, timeline conflicts), a two-dates discipline (event date + statement date on every assertion), and filing-party labeling (every citation names which party filed the document). The generated CLAUDE.md codifies an analytical stance that leans toward the structurally weaker party.
Legal mode adds these CLI helpers:
agent-wiki convert pdf-paged raw/case.pdf wiki/processed/case.md # paged conversion with ## Page N
agent-wiki legal peca-index wiki/processed/case.md # extract sub-document inventory
agent-wiki legal assign-parties wiki/peca-index-case.md # heuristic filing-party labels
agent-wiki legal cards wiki/peca-index-case.md # one reader card per peça
agent-wiki legal wikilinks-to-md wiki/ # Typora compatibility
Run agent-wiki init <path> --mode legal --help for all flags. The case stem, party names, and case numbers are baked into the generated CLAUDE.md; pass nothing to get <Party A> / <Party B> / <case number> placeholders you fill in by hand.
The Pattern
The LLM Wiki pattern has three layers:
- Raw sources — your curated documents. Immutable. The LLM reads but never modifies.
- The wiki — LLM-generated markdown. Source pages, topic pages, cross-references. The LLM owns this entirely.
- The schema — instructions that tell the LLM how the wiki is structured and what workflows to follow.
Four operations:
- Ingest — process a new source into the wiki. Reader creates a source page, writer synthesizes topic pages, reviewer audits quality and structure.
- Review — audit the wiki tree. Find duplicates, misplaced topics, missing articles. Create fix cards for writers.
- Lint — health-check the wiki. Find broken links, broken images, orphan pages, contradictions, missing cross-references.
- Query — answer questions from the wiki. Good answers get filed back as new pages.
The human curates sources and directs exploration. The LLM does the bookkeeping.
Citation Traceability
Every claim in a topic page links to the specific subsection of the source page it came from:
[[Source Page#Reservoir Properties|Author et al. (2014)]]
Each source page statement links back to the original paper text:
- Clean sand porosity 26-30% [[Processed Filename#Results|p.Results]]
This creates a two-hop traceability chain: topic page → source page subsection → original paper section.
Quartz Publishing
The wiki is designed for static publishing with Quartz. All pages use Quartz-compatible frontmatter (title, description, tags), Obsidian wikilinks, and callout syntax. Breadcrumbs are generated from the folder hierarchy — no hardcoded navigation. See docs/publishing.md after init.
For the full LLM Wiki pattern description, see Andrej Karpathy's LLM Wiki gist.
Requirements
- Python 3.12+
- Dependencies:
pymupdf,pymupdf4llm,Pillow
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_wiki-0.1.3.tar.gz.
File metadata
- Download URL: agent_wiki-0.1.3.tar.gz
- Upload date:
- Size: 91.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84c85010bb84f2cbac1be8dba1235c35d25e4a5f2ecc3ad37fb1f5da098331d8
|
|
| MD5 |
f452dedd3c7239e38de110ce6e3f1021
|
|
| BLAKE2b-256 |
cfae736677543d93320e49384bae42c648bb8b6391cecc2463fb6e0d1669f521
|
Provenance
The following attestation bundles were made for agent_wiki-0.1.3.tar.gz:
Publisher:
ci.yml on kkollsga/agent-wiki
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_wiki-0.1.3.tar.gz -
Subject digest:
84c85010bb84f2cbac1be8dba1235c35d25e4a5f2ecc3ad37fb1f5da098331d8 - Sigstore transparency entry: 1645044760
- Sigstore integration time:
-
Permalink:
kkollsga/agent-wiki@22afa69e1ab5e1fe9fec25354435a1ae64aa286c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kkollsga
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@22afa69e1ab5e1fe9fec25354435a1ae64aa286c -
Trigger Event:
push
-
Statement type:
File details
Details for the file agent_wiki-0.1.3-py3-none-any.whl.
File metadata
- Download URL: agent_wiki-0.1.3-py3-none-any.whl
- Upload date:
- Size: 88.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16ef3318f27aa2231dc9625075bc9d7b30e812e23c54f4c0cad3e6a7ae1ee4a4
|
|
| MD5 |
2573709aa5b07cf33f209fa0dd7eb044
|
|
| BLAKE2b-256 |
8f9ce350601ee7a4bed766a757724028a0d7f67ae662d50b900939f3b0180d41
|
Provenance
The following attestation bundles were made for agent_wiki-0.1.3-py3-none-any.whl:
Publisher:
ci.yml on kkollsga/agent-wiki
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_wiki-0.1.3-py3-none-any.whl -
Subject digest:
16ef3318f27aa2231dc9625075bc9d7b30e812e23c54f4c0cad3e6a7ae1ee4a4 - Sigstore transparency entry: 1645044874
- Sigstore integration time:
-
Permalink:
kkollsga/agent-wiki@22afa69e1ab5e1fe9fec25354435a1ae64aa286c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kkollsga
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@22afa69e1ab5e1fe9fec25354435a1ae64aa286c -
Trigger Event:
push
-
Statement type: