A disciplined wiki maintainer for a personal markdown knowledge base.
Project description
Mnexa
A disciplined wiki maintainer for a personal markdown knowledge base. You drop documents into raw/; an LLM reads them and maintains a structured wiki of source / entity / concept pages with cross-references, an index, and a log. You curate; the LLM does the bookkeeping.
Implementation of the pattern in Andrej Karpathy's LLM Wiki gist — read that first; it's the design spec.
Why
Most LLM document tools are RAG: retrieve chunks at query time, generate from the chunks, throw the synthesis away. Mnexa treats the wiki as a persistent, compounding artifact — every ingest updates entity and concept pages once, every query runs against accumulated synthesis instead of re-deriving from raw sources. Open the wiki in Obsidian, Logseq, VS Code, or any markdown editor. The LLM is the maintainer; you are the curator.
Install
Requires Python 3.12+ and uv. Get a Gemini API key at https://aistudio.google.com/apikey.
git clone https://github.com/jiashuoz/mnexa
cd mnexa
uv sync
cp .env.example .env # then paste your GOOGLE_API_KEY
Use
# Create a new vault
uv run mnexa init ~/my-vault
# Drop a source into raw/, then ingest
cp some-paper.pdf ~/my-vault/raw/
cd ~/my-vault
uv run --project /path/to/mnexa mnexa ingest raw/some-paper.pdf
# Ask the wiki a question
uv run --project /path/to/mnexa mnexa query "what does this paper claim?"
# Audit the wiki
uv run --project /path/to/mnexa mnexa lint
Vault layout
my-vault/
├── .git/
├── .gitignore # ignores .mnexa/ and .env
├── .mnexa/ # Mnexa local state (lint reports)
├── CLAUDE.md # the schema — edit §6 to customize
├── raw/ # immutable source documents
└── wiki/
├── index.md # categorized table of contents
├── log.md # append-only activity log
├── sources/ # one page per ingested document
├── entities/ # people, orgs, products, places
└── concepts/ # ideas, techniques, recurring topics
Every successful ingest is a git commit. Free undo, free history, free diff.
How it works
Ingest is a two-stage pipeline:
- Analyze — LLM reads the source plus the schema, index, and obviously-related existing pages. Produces a structured analysis (entities, concepts, claims, contradictions). Internal scratch.
- Generate — LLM emits FILE blocks for the new/updated wiki pages. Mnexa parses, validates paths and frontmatter, substring-verifies that every
⟦"..."⟧source-quote marker appears verbatim in the source, then atomically writes and commits.
The substring verifier is the anti-hallucination floor. If the LLM invents a biographical detail not present in the source, the marker check fails and the ingest aborts with no on-disk changes.
Query is a single LLM call against index.md + the top-N pages by keyword overlap, streamed to stdout with inline [[wikilink]] citations. Logged to log.md.
Lint runs deterministic checks first (broken links, frontmatter, index/wiki sync, orphans, ungrounded pages, slug style), then one LLM call for semantic checks (contradictions, stale claims, missing pages, slug typos). Output: .mnexa/lint/<timestamp>.md.
LLM
Provider-agnostic via a small LLMClient protocol. v0 ships Google Gemini (default gemini-3-flash-preview). Set MNEXA_MODEL to any gemini-* model; set MNEXA_PROVIDER to override the auto-inference. Adding Anthropic or OpenAI is ~80 lines plus an extras entry — not shipped because no one needs it yet.
Status
mnexa init |
✅ |
mnexa ingest |
✅ (.md, .txt, .pdf, .docx) |
mnexa query |
✅ |
mnexa lint |
✅ |
mnexa lint --fix |
not yet (v0.1) |
| save query answer as wiki page | not yet (v0.1) |
| Anthropic / OpenAI providers | not yet |
Develop
uv sync --all-extras
uv run pytest # 45 tests
uv run ruff check .
uv run pyright # strict
Prompts live as files in src/mnexa/prompts/ and load via importlib.resources. Edit them, rerun, iterate.
Design notes
- Pure markdown is the canonical store. No SQLite, no vector index, no FTS5. Karpathy's gist argues
index.mdis enough at moderate scale; we believe it until measurements say otherwise. - Two-stage ingest is borrowed from nashsu/llm_wiki; the deterministic-then-LLM lint tier is borrowed from SamurAIGPT/llm-wiki-agent. The substring-grounding verifier is novel — neither reference project does it.
- Atomic-ish writes via stage-then-rename +
git checkout HEAD --rollback on failure. The git commit is the durability barrier. - Gemini context caching is a no-op at our schema size (~3k tokens, below the threshold). The protocol still expresses intent so other providers can honor it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mnexa-0.0.1.tar.gz.
File metadata
- Download URL: mnexa-0.0.1.tar.gz
- Upload date:
- Size: 95.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1854439d2a45a5cd7f64e1c87abc02f9c9c8fa0e142b2fbbdedbae84a8ff7d15
|
|
| MD5 |
9102f4e39867d0f4127f975a4c4b80d7
|
|
| BLAKE2b-256 |
18ff8d684319257f4b3609c46ff91b0317d5c1560976b45123f8ef183a775abc
|
File details
Details for the file mnexa-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mnexa-0.0.1-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3b7f3cd43c3b4ab0ea52138788a842bac2fdcfddf0008896b0cb0f1ae8828f4
|
|
| MD5 |
2f32ecf89c5d6426d62013e18affd0ae
|
|
| BLAKE2b-256 |
f9fab5dc0d63877288337c42e7772f0c47830c72c5eaecf334c5da8f6f8ec987
|