cite-citadel — an LLM-maintained, fully-cited personal wiki in the Open Knowledge Format, fed by a coding-agent CLI you already have logged in, with an MCP search server. Every fact is cited to its source; nothing is invented.
Project description
cite-citadel
A fortress of cited knowledge. An LLM-maintained, fully-cited personal wiki — every fact is attested to its source, nothing is invented.
An LLM-maintained personal wiki in Google's Open Knowledge Format (OKF), with an MCP server so an AI can search and read it — a KISS, pure-Python 3.12 take on Andrej Karpathy's LLM-Wiki pattern.
Drop arbitrary files into raw/ (markdown, code, JSON/CSV, PDF, PowerPoint/Word/Excel —
.pptx/.docx/.xlsx and legacy .ppt/.doc/.xls — even images, in any sub-folder). One
agentic CLI session per source folds it into a cross-linked OKF wiki under wiki/ — routing each
fact to the page it best fits and splitting/merging pages as the corpus grows, rather than making
one page per file. Office files have their text extracted automatically; images are read visually;
a file too big for one context window is folded in over several passes; the same document in two
formats (report.pdf + report.pptx) is ingested once; and any source that can't be ingested is
recorded (with the reason) in wiki/sources/index.md. Every fact is cited back to its raw/
source, and the model uses only what is in raw/. An AI client then queries the synthesized
wiki over MCP instead of re-reading your notes.
The CLI is citadel; the PyPI package is cite-citadel. The wiki/ directory is the
database — no SQLite, no vector store. Ingest runs through a coding-agent CLI you already have
(claude, copilot, or gemini), so it uses your existing subscription and needs no API key —
that usage is under your account and your provider's terms (see
License & third-party tools).
Three guarantees that hold as the wiki grows (full rules in
citadel/rules/schema.md):
- Stays organized — ingest merges, splits, and deletes pages by fit; it never piles up one page per raw file.
- Links keep working — merges/renames repoint inbound cross-links; any dangling link fails
citadel lint/citadel check. - Honest provenance — raw facts are restated faithfully and cite their source as
[^sN]. A fact the model adds from its own knowledge must be labeled[^llmN], never disguised as a raw citation.
Install
uv add cite-citadel # add to a project
uv tool install cite-citadel # or install a global `citadel` CLI
pip install cite-citadel # or plain pip
Quickstart
Ingest runs through a coding-agent CLI you already have — no API key, just your existing subscription.
citadel init my-wiki && cd my-wiki— scaffolds the workspace (thecitadel.tomlmarker, a.env, and emptyraw/+wiki/).- Fill in the generated
.env. At minimum set the coding-agent CLI to shell out to —CITADEL_LLM_CLI=claude | copilot | gemini— which must be installed and logged in (no API key needed); optionally pin a model withCITADEL_INGEST_MODEL. Every other knob is documented inline in that same file. - Drop any text-bearing files into
raw/— markdown, code, PDF, Office, images, in any sub-folder. citadel ingest— one agent session per source folds it into the cross-linked, cited wiki.- Use it —
citadel search "caffeine"(alsoread,status,doctor,curate,view,lint,check,tags) from the shell, orcitadel serveto expose the wiki to any AI over MCP. Everything the MCP server offers, the CLI offers too — an AI without MCP access can drive citadel through equivalent shell commands.
Contributing? Run from a checkout:
uv sync, then the portableuv run python -m citadel <subcommand>(identical on Linux/macOS/Windows and needs no.exe— on Windows, antivirus can quarantine uv's generatedcitadel.exe).
How it works
Three layers (Karpathy's split; citadel/rules/schema.md has the
authoritative rules, which the ingest agent reads — referenced by path — every run):
raw/— immutable sources; ingest reads but never edits them.wiki/— the LLM-owned OKF bundle: markdown pages with YAML frontmatter, routed by kind intoconcepts/,objects/,systems/,persons/,organizations/,projects/,abbreviations/,misc/, densely cross-linked, each fact carrying a citation. The reservedindex.md,log.md, andsources/index.mdare generated, not authored.citadel/rules/— the schema/rules layer:schema.md(the format contract) +core.md(agent behavior) + per-lifecycletasks/, per-file-typeformats/, and agent-judgedgenres/briefs. Editing them changes how the wiki is built with no code change. The rules live in the package so a pip install carries them; the repo-rootSCHEMA.md/AGENT_INGEST.mdare just pointer stubs.
Per-fact provenance is the load-bearing rule. Every factual sentence ends with a GitHub-Flavored
Markdown footnote, defined in a trailing ## Sources section that links to the originating raw/
file:
Robusta has about twice the caffeine of Arabica.[^s1]
## Sources
[^s1]: [raw/coffee-guide.md](../../raw/coffee-guide.md) — coffee guide (ingested 2026-06-30)
This renders on GitHub, is trivially greppable, and needs zero custom tooling. A claim that can't be
cited is dropped, never invented; conflicting sources produce a > [!CONTRADICTION] callout. The
wiki/ folder also opens as-is as an Obsidian vault.
Test corpora
Three synthetic corpora live under corpora/, each ingestible on its own or all
together. The showcase is corpora/beverages/ — a deliberately
overlapping coffee + tea corpus of 10 files in mixed styles (reference, prose, lab notes, FAQ,
brand blog) with facts that repeat, contradict, and hide in one place, plus one deliberately-false
sourced claim. Two more corpora stress the hardest guarantees:
corpora/counterfactual-atlas/ is a coherent fictional world whose
facts contradict reality, graded that they appear as stated, cited, never corrected;
corpora/project-history/ is a three-year programme ingested in dated
waves that drives reconcile / delete / force and grades temporal supersession, German→English,
and attributed opinions.
Each corpus ships a hidden answer key at .claude/skills/verify-corpus/<name>/ground-truth.md
(outside the corpus, so the ingest agent can never see it). The parameterized verify-corpus skill
(verify-corpus <name>|all) ingests a corpus into a throwaway sandbox and grades the result against
that key — an end-to-end test of the three guarantees.
See the result without running anything. Browse the generated showcase wiki on GitHub at
corpora/beverages/wiki/index.md — GitHub renders the OKF pages natively, so the [^sN] citations,
cross-links, glossary, and > [!CONTRADICTION] callouts all show inline. For the richer, interactive
view — the cross-link graph, tags, and the cited raw sources embedded — open the live demo at
markusneusinger.github.io/cite-citadel, the
offline single-file viewer regenerated from the showcase wiki on every push.
MCP server
citadel serve exposes eight tools over stdio: wiki_search, wiki_read, wiki_index,
wiki_sources, wiki_tags, wiki_validate, wiki_lint (read-only), and wiki_ingest (the only
mutating one). Each carries MCP behavior annotations (readOnlyHint etc.) so a client can tell the
readers from the one mutating tool. Every MCP tool has a CLI counterpart — citadel read,
citadel index, citadel sources, citadel lint, … — so an AI without MCP access can do
everything through the CLI. Wire it into an MCP client (e.g. Claude Desktop):
{
"mcpServers": {
"citadel": {
"command": "citadel",
"args": ["serve"],
"env": { "CITADEL_LLM_CLI": "claude", "CITADEL_INGEST_MODEL": "sonnet" }
}
}
}
An AI can then wiki_index() to orient, wiki_search(...) to find pages, and wiki_read(...) to
pull full cited context — answering from your synthesized wiki instead of re-retrieving documents.
Reference
citadel/rules/README.md— index of the rules tree the ingest agent follows:schema.md(structure, routing, and provenance rules),core.md(operational behavior), plus thetasks/,formats/, andgenres/briefs.citadel/templates/env.example— every configuration knob (thecitadel init.envtemplate; the repo-root.env.exampleis a pointer stub).docs/karpathy-llm-wiki.md·docs/okf-reference.md— the pattern and the format.docs/configuration.md— everyCITADEL_*config knob.CLAUDE.md— architecture notes for contributors.CONTRIBUTING.md·CHANGELOG.md·SECURITY.md
License & third-party tools
cite-citadel is released under the MIT License.
Not affiliated. cite-citadel is an independent project — not affiliated with, endorsed by, or sponsored by Anthropic, GitHub/Microsoft, or Google. "Claude", "GitHub Copilot", and "Gemini" are their respective owners' trademarks, named only to identify the user-supplied CLI. Full disclaimer: NOTICE.md.
Bring your own CLI — your account, your provider's terms. Ingest runs your authenticated coding-agent CLI under your account, and that usage is governed by that provider's terms, not by cite-citadel: Anthropic Consumer Terms / Commercial Terms, the GitHub Copilot product-specific terms, and the Gemini Code Assist / Gemini API terms. cite-citadel calls the official binary only — it does not proxy, store, or transmit your credentials. Honest caveat: heavy, unattended, or CI ingest against a consumer subscription may hit rate limits or a provider's automated-use expectations — for that scale prefer the tier the provider designates for programmatic use.
Your wiki is yours. The providers assign output rights to you, and cite-citadel claims nothing
over wiki/ content — publish the generated wiki freely.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cite_citadel-0.2.0.tar.gz.
File metadata
- Download URL: cite_citadel-0.2.0.tar.gz
- Upload date:
- Size: 343.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27c6071855b1c36ab987a77308eba999fe8e28b2d086fa6f96a4c558753cf7bc
|
|
| MD5 |
3c2f567ba224f7735705a607b6bd2cc6
|
|
| BLAKE2b-256 |
6c55f58da3f54653c96b4dfdd85bd776c9359d8642a1b182ab7c7d9624134e60
|
Provenance
The following attestation bundles were made for cite_citadel-0.2.0.tar.gz:
Publisher:
release.yml on MarkusNeusinger/cite-citadel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cite_citadel-0.2.0.tar.gz -
Subject digest:
27c6071855b1c36ab987a77308eba999fe8e28b2d086fa6f96a4c558753cf7bc - Sigstore transparency entry: 2063742534
- Sigstore integration time:
-
Permalink:
MarkusNeusinger/cite-citadel@eceda93620b47fd6d6e6984cc718bafa1512a1a4 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/MarkusNeusinger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eceda93620b47fd6d6e6984cc718bafa1512a1a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cite_citadel-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cite_citadel-0.2.0-py3-none-any.whl
- Upload date:
- Size: 234.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f567f5e26b3cddf0d56b0c78e126e0e1e3c26b8bf9169fdb779a09ee042dfb47
|
|
| MD5 |
bc72e49123535fe344653e00d8ade2f8
|
|
| BLAKE2b-256 |
ea0def7845cc6751a7367c7557fe5e377bf520d958fc6b324d37391b59735d24
|
Provenance
The following attestation bundles were made for cite_citadel-0.2.0-py3-none-any.whl:
Publisher:
release.yml on MarkusNeusinger/cite-citadel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cite_citadel-0.2.0-py3-none-any.whl -
Subject digest:
f567f5e26b3cddf0d56b0c78e126e0e1e3c26b8bf9169fdb779a09ee042dfb47 - Sigstore transparency entry: 2063742561
- Sigstore integration time:
-
Permalink:
MarkusNeusinger/cite-citadel@eceda93620b47fd6d6e6984cc718bafa1512a1a4 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/MarkusNeusinger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eceda93620b47fd6d6e6984cc718bafa1512a1a4 -
Trigger Event:
push
-
Statement type: