Skip to main content

cite-citadel — an LLM-maintained, fully-cited personal wiki in the Open Knowledge Format, fed by a coding-agent CLI you already have logged in, with an MCP search server. Every fact is cited to its source; nothing is invented.

Project description

cite-citadel

CI PyPI Python versions License: MIT

A fortress of cited knowledge. An LLM-maintained, fully-cited personal wiki — every fact is attested to its source, nothing is invented.

An LLM-maintained personal wiki in Google's Open Knowledge Format (OKF), with an MCP server so an AI can search and read it — a KISS, pure-Python 3.12 take on Andrej Karpathy's LLM-Wiki pattern.

Drop arbitrary files into raw/ (markdown, code, JSON/CSV, PDF, PowerPoint/Word/Excel — .pptx/.docx/.xlsx and legacy .ppt/.doc/.xls — even images, in any sub-folder). One agentic CLI session per source folds it into a cross-linked OKF wiki under wiki/routing each fact to the page it best fits and splitting/merging pages as the corpus grows, rather than making one page per file. Office files have their text extracted automatically; images are read visually; a file too big for one context window is folded in over several passes; the same document in two formats (report.pdf + report.pptx) is ingested once; and any source that can't be ingested is recorded (with the reason) in wiki/sources/index.md. Every fact is cited back to its raw/ source, and the model uses only what is in raw/. An AI client then queries the synthesized wiki over MCP instead of re-reading your notes.

The CLI is citadel; the PyPI package is cite-citadel. The wiki/ directory is the database — no SQLite, no vector store. Ingest runs through a coding-agent CLI you already have (claude, copilot, or gemini), so it uses your existing subscription and needs no API key — that usage is under your account and your provider's terms (see License & third-party tools).

Three guarantees that hold as the wiki grows (full rules in citadel/rules/schema.md):

  • Stays organized — ingest merges, splits, and deletes pages by fit; it never piles up one page per raw file.
  • Links keep working — merges/renames repoint inbound cross-links; any dangling link fails citadel lint / citadel check.
  • Honest provenance — raw facts are restated faithfully and cite their source as [^sN]. A fact the model adds from its own knowledge must be labeled [^llmN], never disguised as a raw citation.

Install

uv add cite-citadel            # add to a project
uv tool install cite-citadel   # or install a global `citadel` CLI
pip install cite-citadel       # or plain pip

Quickstart

Ingest runs through a coding-agent CLI you already have — no API key, just your existing subscription.

  1. citadel init my-wiki && cd my-wiki — scaffolds the workspace (the citadel.toml marker, a .env, and empty raw/ + wiki/).
  2. Fill in the generated .env. At minimum set the coding-agent CLI to shell out to — CITADEL_LLM_CLI=claude | copilot | gemini — which must be installed and logged in (no API key needed); optionally pin a model with CITADEL_INGEST_MODEL. Every other knob is documented inline in that same file.
  3. Drop any text-bearing files into raw/ — markdown, code, PDF, Office, images, in any sub-folder.
  4. citadel ingest — one agent session per source folds it into the cross-linked, cited wiki.
  5. Use itcitadel search "caffeine" (also read, status, doctor, curate, view, lint, check, tags) from the shell, or citadel serve to expose the wiki to any AI over MCP. Everything the MCP server offers, the CLI offers too — an AI without MCP access can drive citadel through equivalent shell commands.

Contributing? Run from a checkout: uv sync, then the portable uv run python -m citadel <subcommand> (identical on Linux/macOS/Windows and needs no .exe — on Windows, antivirus can quarantine uv's generated citadel.exe).

How it works

Three layers (Karpathy's split; citadel/rules/schema.md has the authoritative rules, which the ingest agent reads — referenced by path — every run):

  1. raw/ — immutable sources; ingest reads but never edits them.
  2. wiki/ — the LLM-owned OKF bundle: markdown pages with YAML frontmatter, routed by kind into concepts/, objects/, systems/, persons/, organizations/, projects/, abbreviations/, misc/, densely cross-linked, each fact carrying a citation. The reserved index.md, log.md, and sources/index.md are generated, not authored.
  3. citadel/rules/ — the schema/rules layer: schema.md (the format contract) + core.md (agent behavior) + per-lifecycle tasks/, per-file-type formats/, and agent-judged genres/ briefs. Editing them changes how the wiki is built with no code change. The rules live in the package so a pip install carries them; the repo-root SCHEMA.md/AGENT_INGEST.md are just pointer stubs.

Per-fact provenance is the load-bearing rule. Every factual sentence ends with a GitHub-Flavored Markdown footnote, defined in a trailing ## Sources section that links to the originating raw/ file:

Robusta has about twice the caffeine of Arabica.[^s1]

## Sources

[^s1]: [raw/coffee-guide.md](../../raw/coffee-guide.md) — coffee guide (ingested 2026-06-30)

This renders on GitHub, is trivially greppable, and needs zero custom tooling. A claim that can't be cited is dropped, never invented; conflicting sources produce a > [!CONTRADICTION] callout. The wiki/ folder also opens as-is as an Obsidian vault.

Test corpora

Three synthetic corpora live under corpora/, each ingestible on its own or all together. The showcase is corpora/beverages/ — a deliberately overlapping coffee + tea corpus of 10 files in mixed styles (reference, prose, lab notes, FAQ, brand blog) with facts that repeat, contradict, and hide in one place, plus one deliberately-false sourced claim. Two more corpora stress the hardest guarantees: corpora/counterfactual-atlas/ is a coherent fictional world whose facts contradict reality, graded that they appear as stated, cited, never corrected; corpora/project-history/ is a three-year programme ingested in dated waves that drives reconcile / delete / force and grades temporal supersession, German→English, and attributed opinions.

Each corpus ships a hidden answer key at .claude/skills/verify-corpus/<name>/ground-truth.md (outside the corpus, so the ingest agent can never see it). The parameterized verify-corpus skill (verify-corpus <name>|all) ingests a corpus into a throwaway sandbox and grades the result against that key — an end-to-end test of the three guarantees.

See the result without running anything. Browse the generated showcase wiki on GitHub at corpora/beverages/wiki/index.md — GitHub renders the OKF pages natively, so the [^sN] citations, cross-links, glossary, and > [!CONTRADICTION] callouts all show inline. For the richer, interactive view — the cross-link graph, tags, and the cited raw sources embedded — open the live demo at markusneusinger.github.io/cite-citadel, the offline single-file viewer regenerated from the showcase wiki on every push.

MCP server

citadel serve exposes eight tools over stdio: wiki_search, wiki_read, wiki_index, wiki_sources, wiki_tags, wiki_validate, wiki_lint (read-only), and wiki_ingest (the only mutating one). Each carries MCP behavior annotations (readOnlyHint etc.) so a client can tell the readers from the one mutating tool. Every MCP tool has a CLI counterpart — citadel read, citadel index, citadel sources, citadel lint, … — so an AI without MCP access can do everything through the CLI. Wire it into an MCP client (e.g. Claude Desktop):

{
  "mcpServers": {
    "citadel": {
      "command": "citadel",
      "args": ["serve"],
      "env": { "CITADEL_LLM_CLI": "claude", "CITADEL_INGEST_MODEL": "sonnet" }
    }
  }
}

An AI can then wiki_index() to orient, wiki_search(...) to find pages, and wiki_read(...) to pull full cited context — answering from your synthesized wiki instead of re-retrieving documents.

Reference

License & third-party tools

cite-citadel is released under the MIT License.

Not affiliated. cite-citadel is an independent project — not affiliated with, endorsed by, or sponsored by Anthropic, GitHub/Microsoft, or Google. "Claude", "GitHub Copilot", and "Gemini" are their respective owners' trademarks, named only to identify the user-supplied CLI. Full disclaimer: NOTICE.md.

Bring your own CLI — your account, your provider's terms. Ingest runs your authenticated coding-agent CLI under your account, and that usage is governed by that provider's terms, not by cite-citadel: Anthropic Consumer Terms / Commercial Terms, the GitHub Copilot product-specific terms, and the Gemini Code Assist / Gemini API terms. cite-citadel calls the official binary only — it does not proxy, store, or transmit your credentials. Honest caveat: heavy, unattended, or CI ingest against a consumer subscription may hit rate limits or a provider's automated-use expectations — for that scale prefer the tier the provider designates for programmatic use.

Your wiki is yours. The providers assign output rights to you, and cite-citadel claims nothing over wiki/ content — publish the generated wiki freely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_citadel-0.2.0.tar.gz (343.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_citadel-0.2.0-py3-none-any.whl (234.9 kB view details)

Uploaded Python 3

File details

Details for the file cite_citadel-0.2.0.tar.gz.

File metadata

  • Download URL: cite_citadel-0.2.0.tar.gz
  • Upload date:
  • Size: 343.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cite_citadel-0.2.0.tar.gz
Algorithm Hash digest
SHA256 27c6071855b1c36ab987a77308eba999fe8e28b2d086fa6f96a4c558753cf7bc
MD5 3c2f567ba224f7735705a607b6bd2cc6
BLAKE2b-256 6c55f58da3f54653c96b4dfdd85bd776c9359d8642a1b182ab7c7d9624134e60

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_citadel-0.2.0.tar.gz:

Publisher: release.yml on MarkusNeusinger/cite-citadel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cite_citadel-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cite_citadel-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 234.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cite_citadel-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f567f5e26b3cddf0d56b0c78e126e0e1e3c26b8bf9169fdb779a09ee042dfb47
MD5 bc72e49123535fe344653e00d8ade2f8
BLAKE2b-256 ea0def7845cc6751a7367c7557fe5e377bf520d958fc6b324d37391b59735d24

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_citadel-0.2.0-py3-none-any.whl:

Publisher: release.yml on MarkusNeusinger/cite-citadel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page