Skip to main content

MCP server exposing fast, structured search over lore.kernel.org to LLM developer tools.

Project description

kernel-lore-mcp

PyPI version Release License: MIT

Free (MIT) MCP server exposing structured search over the Linux kernel mailing list archives at lore.kernel.org to LLM-backed developer tools — Claude Code, Codex, Cursor, Zed, anything else that speaks the Model Context Protocol.

No authentication, ever. No API keys, no OAuth, no login flow. Same anonymous posture on every deployment — local, hosted, everywhere. Every agent that asks us a question is one fewer agent scraping lore directly; fanout-to-one is the value proposition.

Quick start

Install is one command. The first sync is where real time goes — budget honestly depending on what you want to cover:

Shape Disk First-sync wall-clock
1–2 small lists (wireguard, xdp-newbies) ~1 GB 1–5 min
Subsystem slice (lkml + netdev + linux-cifs) ~25 GB 15–60 min
Full lore (390 shards, every list) ~100 GB 4–12 h

Steady-state syncs on the 5-min timer after cold-start are seconds.

# 1. install — one command, pre-built abi3 wheel, no Rust toolchain required
uv tool install kernel-lore-mcp

# 2. first sync — manifest fetch + gix fetch + ingest in one process
#    under one writer lock. Pick a small slice for a first experiment:
export KLMCP_DATA_DIR=~/klmcp-data
mkdir -p "$KLMCP_DATA_DIR"
kernel-lore-sync \
    --data-dir "$KLMCP_DATA_DIR" \
    --with-over \
    --include '/wireguard/*' --include '/linux-cifs/*'
# Drop --include to mirror all ~390 lists. Plan the disk + time.

# 3. (optional, recommended) build the path-mention index. Tiny, fast.
python -c 'from kernel_lore_mcp import _core; \
           print(_core.rebuild_path_vocab("'"$KLMCP_DATA_DIR"'"))'

# 4. confirm freshness + which capabilities are provisioned
kernel-lore-mcp status --data-dir "$KLMCP_DATA_DIR"
# Look at `capabilities`: each over_db / bm25 / path_vocab / embedding /
# maintainers / git_sidecar boolean tells you which tools will actually
# return data on this deployment.

# 4b. inspect shard/index health; add --heal to repair unborn shard HEADs
#     and remove unrecoverable shard repos so the next sync reclones them
kernel-lore-doctor --data-dir "$KLMCP_DATA_DIR"

# 5. verify the MCP surface — zero API cost
git clone --depth 1 https://github.com/mjbommar/kernel-lore-mcp.git
cd kernel-lore-mcp && ./scripts/agentic_smoke.sh local
# PASS: 7/7 tools, 5/5 resource templates, 5/5 prompts (the
# `REQUIRED_*` subset from src/kernel_lore_mcp/_surface_manifest.py;
# the live server registers 24 tools in total).

Then pick your agent and copy its snippet from docs/mcp/client-config.md. All four clients (Claude Code, Codex, Cursor, Zed) work over stdio against the exact same server binary.

Optional capabilities — opt in when you need them

The baseline sync gives you everything a typical query asks for. Three tiers are explicitly opt-in because they cost disk or time and not every deployment wants them:

Capability Build When you want it
BM25 prose search (b: / free text) kernel-lore-ingest --rebuild-bm25 semantic-free text search over prose bodies
Semantic embeddings (lore_nearest, lore_similar) kernel-lore-embed --data-dir $KLMCP_DATA_DIR "more like this" / free-text → vector ANN
Git-sidecar (authoritative merged + picked_up) kernel-lore-build-git-sidecar --repo linux-stable --path /path/to/linux-stable.git upgrades lore_stable_backport_status + lore_thread_state from lore heuristic to git-history truth
MAINTAINERS snapshot drop a MAINTAINERS file into $KLMCP_DATA_DIR or point $KLMCP_MAINTAINERS_FILE at one lore_maintainer_profile declared-vs-observed ownership

kernel-lore-mcp status reports which are ready via the capabilities field, and tools that need an un-provisioned tier return a setup_required error naming the exact command to fix it (no silent empty results).

Install from source

Contributing? Building a custom binary?

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \
    sh -s -- -y --default-toolchain stable
git clone https://github.com/mjbommar/kernel-lore-mcp.git
cd kernel-lore-mcp
uv sync
uv run maturin develop --release
cargo build --release --bin kernel-lore-sync --bin kernel-lore-ingest --bin kernel-lore-doctor
./target/release/kernel-lore-sync --data-dir $KLMCP_DATA_DIR --with-over
./target/release/kernel-lore-doctor --data-dir $KLMCP_DATA_DIR

Going bigger

Want fuller coverage? Drop --include flags to mirror all ~390 lists (~100+ GB first run).

Want production-grade systemd deployment (single klmcp-sync.timer replacing the pre-v0.2.0 grokmirror + ingest pair)? docs/ops/runbook.md §1 onwards.

Status — v0.3.0 (2026-04-21)

Current release: the hosted-readiness line that would previously have been cut as 0.2.3 now lands in v0.3.0. The main additions on top of 0.2.2 are: sync self-healing for poisoned shard repos, hosted-mode regex gating, generation-bound lore_corpus_stats caching, automatic path-vocab rebuild during sync, explicit local/hosted deployment profiles, structured slow-path profiling logs, and a repeatable HTTP/MCP adversarial-load harness plus public-launch checklist.

Shipped:

  • Ingest pipeline — gix + mail-parser + metadata / over.db / trigram / BM25 / embedding tiers. Incremental; dangling-OID safe; single-writer flock.
  • kernel-lore-sync — one Rust binary that internalized the legacy grokmirror + separate-ingest two-process chain. HTTPS manifest fetch, gix smart-HTTP clone-or-fetch (rayon-fanned across shards), ingest, tid rebuild, generation bump — all under one writer lock so there's no trigger/debounce race.
  • Full MCP surface: 24 tools (search, primitives, sampling- backed summarize/classify/explain, authoritative merged / picked_up verdicts via git-sidecar, lore_corpus_stats for coverage transparency, lore_author_footprint for address- mention search), 5 RFC-6570 resource templates, 2 static resources (blind-spots://coverage, stats://coverage), 5 slash-command prompts, populated KWIC snippets, freshness marker + capability booleans on every response.
  • HMAC-signed pagination cursors live on lore_search, lore_patch_search, lore_regex, lore_activity, lore_author_footprint. Query-scoped, tamper-detected.
  • stdio + Streamable HTTP transports; no SSE.
  • /status + /metrics (Prometheus) with freshness_ok + per-tier capabilities flags so clients distinguish "no results" from "feature not provisioned."
  • systemd units for hosted deploy; 5-min klmcp-sync.timer cadence (docs/ops/update-frequency.md).
  • Live-tested against real claude --print and codex exec every commit via scripts/agentic_smoke.sh.

Next: see docs/plans/2026-04-20-v0.3.0-plan.md — tag close-out, kernel-lore-sync --bootstrap, auto-built path vocab, CI perf gate, lore_maintainer_graph, thread-state classifier upgrade.

Deferred past v0.3: trained kernel-specific retrieval model (docs/research/training-retriever.md), snapshot-bundle reciprocity, Patchwork state integration, CVE-chain tool (all planned; see docs/plans/2026-04-14-best-in-class-kernel-mcp.md).

Why

Linux kernel development lives on ~390 public mailing lists. lei and b4 work well for humans with terminals, but LLM-backed developer tools have no equivalent: they can't answer "who touched fs/smb/server/smbacl.c in the last 90 days, grouped by series, with trailers" or "has this XDR overflow pattern been reported before" without being fed curated context by hand.

This project closes that gap. One MCP server over the full corpus, so an agent working on kernel code has the same research surface a senior maintainer has. And because it's all mirrored + indexed once, every agent query is zero HTTP load on lore.kernel.org.

Architecture in one paragraph

Four-tier index plus an embedding tier, purpose-built per query class: columnar metadata (Arrow/Parquet) for analytical scans; SQLite over.db (public-inbox pattern) for sub-millisecond metadata point lookups and predicate scans; trigram (fst + roaring) for patch/diff content with DFA-only regex confirmation; BM25 (tantivy) for prose; semantic (HNSW via instant-distance) for "more like this." Rust core via PyO3 0.28 does the heavy lifting; Python + FastMCP 3.2 serves MCP over stdio + Streamable HTTP. Ingestion is incremental from public-inbox git shards pulled via kernel-lore-sync (gix smart- HTTP + lore manifest-diff), replacing the pre-v0.2.0 grokmirror dependency. The zstd-compressed raw store is the source of truth; all four tiers rebuild from it.

North star: a trained kernel retriever

The Parquet metadata tier captures the training signal for free — subject/body pairs, series version chains, Fixes: → target SHA, reply graphs via in_reply_to / references, trailer co-occurrence. A future phase trains a <200 MB int8-quantized CPU-inferable retriever on that self-supervised signal. Recipe: docs/research/training-retriever.md.

Documentation

License

MIT. See LICENSE.

Data from lore.kernel.org is re-hosted under the same terms as lore itself (public archive). Attribution preserved in every response. Redaction policy: LEGAL.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kernel_lore_mcp-0.3.0.tar.gz (832.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.39+ x86-64

kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_aarch64.whl (6.4 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.39+ ARM64

kernel_lore_mcp-0.3.0-cp312-abi3-macosx_11_0_arm64.whl (5.6 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

File details

Details for the file kernel_lore_mcp-0.3.0.tar.gz.

File metadata

  • Download URL: kernel_lore_mcp-0.3.0.tar.gz
  • Upload date:
  • Size: 832.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kernel_lore_mcp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 dd230969548dc80008e251b220b7a3675a204d92d32d948a056e5576f015b93f
MD5 400917815f96990f89db84490806a55b
BLAKE2b-256 b964cc1ac7847ccb6d9a00075b42815b5c381116c3bd51d822e597ef204ce11d

See more details on using hashes here.

File details

Details for the file kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_x86_64.whl.

File metadata

  • Download URL: kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_x86_64.whl
  • Upload date:
  • Size: 6.8 MB
  • Tags: CPython 3.12+, manylinux: glibc 2.39+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 bfca226068ee4aa1d3d28dfc1646a9ac1ba8066c9669d1b44f4565c43e18f06f
MD5 4b3ff80f5b57291567f67c278fe1a0fa
BLAKE2b-256 0abfbd82efdda673cd72472201753573ca6a147fc128ae6f108be8678c051842

See more details on using hashes here.

File details

Details for the file kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_aarch64.whl.

File metadata

  • Download URL: kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_aarch64.whl
  • Upload date:
  • Size: 6.4 MB
  • Tags: CPython 3.12+, manylinux: glibc 2.39+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kernel_lore_mcp-0.3.0-cp312-abi3-manylinux_2_39_aarch64.whl
Algorithm Hash digest
SHA256 7622379c082525e653d3ec91d1e940a20f6ce23d3420ddcdf5f73344bae645d7
MD5 779e476ef063ed16d228790d2fe6f4f0
BLAKE2b-256 b5b00eb7517ed22839f9e573d628241ab61e42ea88acf619538ae7a27a2bbd52

See more details on using hashes here.

File details

Details for the file kernel_lore_mcp-0.3.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: kernel_lore_mcp-0.3.0-cp312-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 5.6 MB
  • Tags: CPython 3.12+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kernel_lore_mcp-0.3.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7ada00dd39493394b4246775a3a3239a19312488687b9ab3da253fc20916c225
MD5 588ad5d8b990077aeaefdd2581870dfa
BLAKE2b-256 518a03563ec55810292eb7d58e86f2400cd10f57c9a5ac6e373a83192ededfa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page