MCP server exposing fast, structured search over lore.kernel.org to LLM developer tools.
Project description
kernel-lore-mcp
Free (MIT) MCP server exposing structured search over the Linux kernel mailing list archives at lore.kernel.org to LLM-backed developer tools — Claude Code, Codex, Cursor, Zed, anything else that speaks the Model Context Protocol.
No authentication, ever. No API keys, no OAuth, no login flow. Same anonymous posture on every deployment — local, hosted, everywhere. Every agent that asks us a question is one fewer agent scraping lore directly; fanout-to-one is the value proposition.
Quick start
Install is one command. The first sync is where real time goes — budget honestly depending on what you want to cover:
| Shape | Disk | First-sync wall-clock |
|---|---|---|
1–2 small lists (wireguard, xdp-newbies) |
~1 GB | 1–5 min |
| Subsystem slice (lkml + netdev + linux-cifs) | ~25 GB | 15–60 min |
| Full lore (390 shards, every list) | ~100 GB | 4–12 h |
Steady-state syncs on the 5-min timer after cold-start are seconds.
# 1. install — one command, pre-built abi3 wheel, no Rust toolchain required
uv tool install kernel-lore-mcp
# 2. first sync — manifest fetch + gix fetch + ingest in one process
# under one writer lock. Pick a small slice for a first experiment:
export KLMCP_DATA_DIR=~/klmcp-data
mkdir -p "$KLMCP_DATA_DIR"
kernel-lore-sync \
--data-dir "$KLMCP_DATA_DIR" \
--with-over \
--include '/wireguard/*' --include '/linux-cifs/*'
# Drop --include to mirror all ~390 lists. Plan the disk + time.
# 3. confirm freshness + which capabilities are provisioned
kernel-lore-mcp status --data-dir "$KLMCP_DATA_DIR"
# Look at `capabilities`: each over_db / bm25 / path_vocab / embedding /
# maintainers / git_sidecar boolean tells you which tools will actually
# return data on this deployment. While a sync is active, the same
# status output also shows `writer_lock_present`, `sync_active`, and
# the current sync stage.
# 3b. inspect shard/index health; add --heal to repair unborn shard HEADs
# and remove unrecoverable shard repos so the next sync reclones them
kernel-lore-doctor --data-dir "$KLMCP_DATA_DIR"
# 4. verify the MCP surface — zero API cost
git clone --depth 1 https://github.com/mjbommar/kernel-lore-mcp.git
cd kernel-lore-mcp && ./scripts/agentic_smoke.sh local
# PASS: 7/7 tools, 5/5 resource templates, 5/5 prompts (the
# `REQUIRED_*` subset from src/kernel_lore_mcp/_surface_manifest.py;
# the live server registers 25 tools in total).
Then pick your agent and copy its snippet from
docs/mcp/client-config.md. All four
clients (Claude Code, Codex, Cursor, Zed) work over stdio against
the exact same server binary.
Optional capabilities — opt in when you need them
The baseline sync gives you everything a typical query asks for. Three tiers are explicitly opt-in because they cost disk or time and not every deployment wants them:
| Capability | Build | When you want it |
|---|---|---|
BM25 prose search (b: / free text) |
kernel-lore-ingest --rebuild-bm25 |
semantic-free text search over prose bodies |
Semantic embeddings (lore_nearest, lore_similar) |
kernel-lore-embed --data-dir $KLMCP_DATA_DIR |
"more like this" / free-text → vector ANN |
Git-sidecar (authoritative merged + picked_up) |
kernel-lore-build-git-sidecar --repo linux-stable --path /path/to/linux-stable.git |
upgrades lore_stable_backport_status + lore_thread_state from lore heuristic to git-history truth |
| MAINTAINERS snapshot | drop a MAINTAINERS file into $KLMCP_DATA_DIR or point $KLMCP_MAINTAINERS_FILE at one |
lore_maintainer_profile declared-vs-observed ownership |
kernel-lore-mcp status reports which are ready via the
capabilities field, and tools that need an un-provisioned tier
return a setup_required error naming the exact command to fix it
(no silent empty results).
Install from source
Contributing? Building a custom binary?
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \
sh -s -- -y --default-toolchain stable
git clone https://github.com/mjbommar/kernel-lore-mcp.git
cd kernel-lore-mcp
uv sync
uv run maturin develop --release
cargo build --release --bin kernel-lore-sync --bin kernel-lore-ingest --bin kernel-lore-doctor
./target/release/kernel-lore-sync --data-dir $KLMCP_DATA_DIR --with-over
./target/release/kernel-lore-doctor --data-dir $KLMCP_DATA_DIR
Going bigger
Want fuller coverage? Drop --include flags to mirror all ~390
lists (~100+ GB first run).
Want production-grade systemd deployment (single klmcp-sync.timer
replacing the pre-v0.2.0 grokmirror + ingest pair)?
docs/ops/runbook.md §1 onwards.
Status — v0.3.2 (2026-04-22)
Current release: v0.3.2, the follow-on patch after the hosted
readiness and same-box sync hardening line. The focus is better bug
workflow ergonomics and safer query scoping: a first-class
lore_fix_status tool, indexed trailer-reference correlation for
syzbot / lore / Fixes: joins, and human-readable since / until
bounds across both tools and lore_search.
Shipped:
- Ingest pipeline — gix + mail-parser + metadata / over.db / trigram / BM25 / embedding tiers. Incremental; dangling-OID safe; single-writer flock.
kernel-lore-sync— one Rust binary that internalized the legacygrokmirror+ separate-ingest two-process chain. HTTPS manifest fetch, gix smart-HTTP clone-or-fetch (rayon-fanned across shards), ingest, tid rebuild, generation bump — all under one writer lock so there's no trigger/debounce race.- Full MCP surface: 25 tools (search, primitives, sampling-
backed summarize/classify/explain, authoritative
merged/picked_upverdicts via git-sidecar,lore_corpus_statsfor coverage transparency,lore_author_footprintfor address- mention search), 5 RFC-6570 resource templates, 2 static resources (blind-spots://coverage,stats://coverage), 5 slash-command prompts, populated KWIC snippets, freshness marker + capability booleans on every response. - HMAC-signed pagination cursors live on
lore_search,lore_patch_search,lore_regex,lore_activity,lore_author_footprint. Query-scoped, tamper-detected. - stdio + Streamable HTTP transports; no SSE.
/status+/metrics(Prometheus) withfreshness_ok+ per-tiercapabilitiesflags so clients distinguish "no results" from "feature not provisioned."- systemd units for hosted deploy; 5-min
klmcp-sync.timercadence (docs/ops/update-frequency.md). - Live-tested against real
claude --printandcodex execevery commit viascripts/agentic_smoke.sh.
Next: see docs/plans/2026-04-20-v0.3.0-plan.md
— tag close-out, kernel-lore-sync --bootstrap, auto-built path
vocab, CI perf gate, lore_maintainer_graph, thread-state
classifier upgrade.
Deferred past v0.3: trained kernel-specific retrieval model
(docs/research/training-retriever.md),
snapshot-bundle reciprocity, Patchwork state integration, CVE-chain
tool (all planned; see
docs/plans/2026-04-14-best-in-class-kernel-mcp.md).
Why
Linux kernel development lives on ~390 public mailing lists. lei
and b4 work well for humans with terminals, but LLM-backed
developer tools have no equivalent: they can't answer "who touched
fs/smb/server/smbacl.c in the last 90 days, grouped by series,
with trailers" or "has this XDR overflow pattern been reported
before" without being fed curated context by hand.
This project closes that gap. One MCP server over the full corpus, so an agent working on kernel code has the same research surface a senior maintainer has. And because it's all mirrored + indexed once, every agent query is zero HTTP load on lore.kernel.org.
Architecture in one paragraph
Four-tier index plus an embedding tier, purpose-built per query
class: columnar metadata (Arrow/Parquet) for analytical scans;
SQLite over.db (public-inbox pattern) for sub-millisecond
metadata point lookups and predicate scans; trigram (fst +
roaring) for patch/diff content with DFA-only regex confirmation;
BM25 (tantivy) for prose; semantic (HNSW via
instant-distance) for "more like this." Rust core via
PyO3 0.28 does the heavy lifting; Python + FastMCP 3.2 serves
MCP over stdio + Streamable HTTP. Ingestion is incremental from
public-inbox git shards pulled via kernel-lore-sync (gix smart-
HTTP + lore manifest-diff), replacing the pre-v0.2.0 grokmirror
dependency. The
zstd-compressed raw store is the source of truth; all four
tiers rebuild from it.
North star: a trained kernel retriever
The Parquet metadata tier captures the training signal for free —
subject/body pairs, series version chains, Fixes: → target SHA,
reply graphs via in_reply_to / references, trailer co-occurrence.
A future phase trains a <200 MB int8-quantized CPU-inferable
retriever on that self-supervised signal. Recipe:
docs/research/training-retriever.md.
Documentation
CLAUDE.md— authoritative project state + non-negotiable product constraintsCHANGELOG.md— release historyCONTRIBUTING.md— dev loop, PR disciplineSECURITY.md— disclosure posturedocs/ops/runbook.md— local dev (§0A)- hosted deploy (§1+)
docs/ops/update-frequency.md— 5-min cadence policy + fanout-to-one cost analysisdocs/ops/production-hardening.md— threat model, cost-class caps, capability flags, systemd layoutdocs/ops/public-launch-checklist.md— pre-launch hosted-box gate: shard health, metrics, harness, log readabilitydocs/mcp/client-config.md— copy-paste snippets for Claude Code, Codex, Cursor, Zeddocs/mcp/transport-auth.md— transport + why no authdocs/architecture/— design rationaledocs/plans/2026-04-20-v0.3.0-plan.md— active release plandocs/plans/2026-04-14-best-in-class-kernel-mcp.md— 6-month roadmap (north star)docs/research/— dated investigations that fed the plan
License
MIT. See LICENSE.
Data from lore.kernel.org is re-hosted under the same terms as
lore itself (public archive). Attribution preserved in every
response. Redaction policy: LEGAL.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kernel_lore_mcp-0.3.5.tar.gz.
File metadata
- Download URL: kernel_lore_mcp-0.3.5.tar.gz
- Upload date:
- Size: 864.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85bdcd61a485ba46fe57b9780137133644786f5c5d4883cf2a876d11ac1401ec
|
|
| MD5 |
9791a5ec377e6bd60d0a363bd141f136
|
|
| BLAKE2b-256 |
cd0de787ddebe982d947e4a78ac34daab23ad82d6939b0f99e48aac394320257
|
File details
Details for the file kernel_lore_mcp-0.3.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: kernel_lore_mcp-0.3.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 21.5 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e768211ca8b12b4321ae32e5b81ef02e2d225883ee3711d770c1d2195856240
|
|
| MD5 |
aada5aadd9c718c7630bbc5fcea41e46
|
|
| BLAKE2b-256 |
ec52ccc2523ed3bd450b0cbf9516551fb0313908c4eea5b612e44ee23ca0f179
|