Skip to main content

A disciplined wiki maintainer for a personal markdown knowledge base.

Project description

Mnexa

A disciplined wiki maintainer for a personal markdown knowledge base. You drop documents into raw/; an LLM reads them and maintains a structured wiki of source / entity / concept pages with cross-references, an index, and a log. You curate; the LLM does the bookkeeping.

Implementation of the pattern in Andrej Karpathy's LLM Wiki gist — read that first; it's the design spec.

Why

Most LLM document tools are RAG: retrieve chunks at query time, generate from the chunks, throw the synthesis away. Mnexa treats the wiki as a persistent, compounding artifact — every ingest updates entity and concept pages once, every query runs against accumulated synthesis instead of re-deriving from raw sources. Open the wiki in Obsidian, Logseq, VS Code, or any markdown editor. The LLM is the maintainer; you are the curator.

Install

Requires Python 3.12+ and uv. Get a Gemini API key at https://aistudio.google.com/apikey.

git clone https://github.com/jiashuoz/mnexa
cd mnexa
uv sync
cp .env.example .env   # then paste your GOOGLE_API_KEY

Use

# Create a new vault
uv run mnexa init ~/my-vault

# Drop a source into raw/, then ingest
cp some-paper.pdf ~/my-vault/raw/
cd ~/my-vault
uv run --project /path/to/mnexa mnexa ingest raw/some-paper.pdf

# Ask the wiki a question
uv run --project /path/to/mnexa mnexa query "what does this paper claim?"

# Audit the wiki
uv run --project /path/to/mnexa mnexa lint

Vault layout

my-vault/
├── .git/
├── .gitignore                  # ignores .mnexa/ and .env
├── .mnexa/                     # Mnexa local state (lint reports)
├── CLAUDE.md                   # the schema — edit §6 to customize
├── raw/                        # immutable source documents
└── wiki/
    ├── index.md                # categorized table of contents
    ├── log.md                  # append-only activity log
    ├── sources/                # one page per ingested document
    ├── entities/               # people, orgs, products, places
    └── concepts/               # ideas, techniques, recurring topics

Every successful ingest is a git commit. Free undo, free history, free diff.

How it works

Ingest is a two-stage pipeline:

  1. Analyze — LLM reads the source plus the schema, index, and obviously-related existing pages. Produces a structured analysis (entities, concepts, claims, contradictions). Internal scratch.
  2. Generate — LLM emits FILE blocks for the new/updated wiki pages. Mnexa parses, validates paths and frontmatter, substring-verifies that every ⟦"..."⟧ source-quote marker appears verbatim in the source, then atomically writes and commits.

The substring verifier is the anti-hallucination floor. If the LLM invents a biographical detail not present in the source, the marker check fails and the ingest aborts with no on-disk changes.

Query is a single LLM call against index.md + the top-N pages by keyword overlap, streamed to stdout with inline [[wikilink]] citations. Logged to log.md.

Lint runs deterministic checks first (broken links, frontmatter, index/wiki sync, orphans, ungrounded pages, slug style), then one LLM call for semantic checks (contradictions, stale claims, missing pages, slug typos). Output: .mnexa/lint/<timestamp>.md.

LLM

Provider-agnostic via a small LLMClient protocol. v0 ships Google Gemini (default gemini-3-flash-preview). Set MNEXA_MODEL to any gemini-* model; set MNEXA_PROVIDER to override the auto-inference. Adding Anthropic or OpenAI is ~80 lines plus an extras entry — not shipped because no one needs it yet.

Status

mnexa init
mnexa ingest ✅ (.md, .txt, .pdf, .docx)
mnexa query
mnexa lint
mnexa lint --fix not yet (v0.1)
save query answer as wiki page not yet (v0.1)
Anthropic / OpenAI providers not yet

Develop

uv sync --all-extras
uv run pytest         # 45 tests
uv run ruff check .
uv run pyright        # strict

Prompts live as files in src/mnexa/prompts/ and load via importlib.resources. Edit them, rerun, iterate.

Design notes

  • Pure markdown is the canonical store. No SQLite, no vector index, no FTS5. Karpathy's gist argues index.md is enough at moderate scale; we believe it until measurements say otherwise.
  • Two-stage ingest is borrowed from nashsu/llm_wiki; the deterministic-then-LLM lint tier is borrowed from SamurAIGPT/llm-wiki-agent. The substring-grounding verifier is novel — neither reference project does it.
  • Atomic-ish writes via stage-then-rename + git checkout HEAD -- rollback on failure. The git commit is the durability barrier.
  • Gemini context caching is a no-op at our schema size (~3k tokens, below the threshold). The protocol still expresses intent so other providers can honor it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnexa-0.0.1.tar.gz (95.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mnexa-0.0.1-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file mnexa-0.0.1.tar.gz.

File metadata

  • Download URL: mnexa-0.0.1.tar.gz
  • Upload date:
  • Size: 95.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for mnexa-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1854439d2a45a5cd7f64e1c87abc02f9c9c8fa0e142b2fbbdedbae84a8ff7d15
MD5 9102f4e39867d0f4127f975a4c4b80d7
BLAKE2b-256 18ff8d684319257f4b3609c46ff91b0317d5c1560976b45123f8ef183a775abc

See more details on using hashes here.

File details

Details for the file mnexa-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mnexa-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for mnexa-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3b7f3cd43c3b4ab0ea52138788a842bac2fdcfddf0008896b0cb0f1ae8828f4
MD5 2f32ecf89c5d6426d62013e18affd0ae
BLAKE2b-256 f9fab5dc0d63877288337c42e7772f0c47830c72c5eaecf334c5da8f6f8ec987

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page