Local Python docs MCP server, accelerated with Rust
Project description
pydocs-mcp
Local, version-aware code & docs search for your AI coding agent — over the exact library versions installed on your machine.
Your AI assistant thinks you're on requests 2.28. You actually have 2.31. It
calls a kwarg that was renamed two versions ago, your test fails, and you lose
twenty minutes. The fix isn't a smarter prompt — it's giving the AI docs that
match your lockfile, not the average of every StackOverflow answer it ever
read.
pydocs-mcp indexes your project plus every installed dependency, right on your machine, in seconds. Your agent connects over MCP and gets answers grounded in your code — fully offline.
What you get
- Matched to your install. Searches the exact versions sitting in your
site-packages, so your agent stops inventing APIs from some older release. - Private & offline. Everything runs locally — no API keys, no uploads, no rate limits, no per-query fees.
- Three ways to find code. Keyword, meaning, and LLM reasoning (see How it works) — on their own or fused into one ranked answer.
- Knows how your code connects. Ask "what calls this?", "what does it call?", or "what does this class inherit?" — across your project and every dependency.
- Lean, not bloated. Minimal dependencies — no PyTorch, no FAISS. A small
local ONNX embedder plus the Rust TurboQuant
vector store (
turbovec), which packs embeddings ~16× smaller than float32 (a 1536-dim vector drops from 6,144 to 384 bytes; a 10M-doc corpus fits in 4 GB instead of 31 GB) and benchmarks faster than FAISS FastScan. The on-disk index stays tiny and search stays quick. - Cheap to keep current. Edit a doc and only the changed chunks are re-embedded — partial re-ingestion, not a full rebuild — while unchanged packages are skipped in under 100 ms. A Rust core does the heavy lifting.
How it works
Three steps, all on your machine (see the diagram above):
- Index — pydocs-mcp scans your project and installed deps into a local
SQLite database (code chunks, metadata, and a graph of how everything
references everything else) plus a compact TurboQuant
.tqvector file for meaning-based search. Re-running is cheap: unchanged packages are skipped, and when a file does change, only its changed chunks are re-embedded. - Search — each query can use three complementary modes and fuse them into
one ranked list:
- Keyword — instant, exact matches for names, error strings, and signatures.
- Meaning — dense embeddings find the right code even when your words differ from the docs', via a small model that runs locally.
- Reasoning — for broad or structural questions, an LLM walks your code's map (titles + summaries, no embeddings) to pick the best spots.
- Answer — results flow back to your agent through two simple tools:
search(find by relevance) andlookup(jump to a known name, or trace its callers, callees, and inheritance).
The only call that ever leaves your machine is the optional reasoning mode — and only if you turn it on with your own key.
Quick start
pip install -e . # pure Python, works everywhere
# …or with the Rust core for speed:
pip install maturin && maturin develop --release
Linux needs OpenBLAS for the vector store (macOS and Windows already ship it):
sudo apt-get install -y libopenblas-pthread-dev
Then index your project and start the server:
pydocs-mcp serve . # index project + deps, serve over MCP (stdio)
pydocs-mcp serve . --gpu # …same, with CUDA-accelerated embeddings
pydocs-mcp search "batch inference" # the same search, from the CLI
pydocs-mcp lookup requests.auth.HTTPBasicAuth --show inherits
Embeddings run on CPU by default. Add --gpu to serve / index (or the
benchmark runner) to move all embedder inference — FastEmbed, the
sentence_transformers provider, and PyLate — onto CUDA. It's a latency knob only: no YAML change, no
re-index, identical results. Needs the matching GPU runtime — see
INSTALL.md.
Live re-indexing (optional)
If you edit code while you want the index to stay fresh, install the
watch extras and pick one of two modes — both debounce edits to
.py, .md, and .ipynb files into a single reindex.
pip install 'pydocs-mcp[watch]'
pydocs-mcp serve . --watch # MCP server + watcher (for AI clients)
pydocs-mcp watch . # watcher only (no MCP server; index stays fresh for CLI `search` / `lookup`)
Both modes share the same YAML tunables: debounce, file extensions, and
ignored paths live under serve.watch.* in your pydocs-mcp.yaml (see
DOCUMENTATION.md).
Point Claude Code, Cursor, or Continue.dev at it over stdio — copy-paste client
configs are in DOCUMENTATION.md, and
install troubleshooting (including the libopenblas fallback) is in
INSTALL.md.
How it compares
pydocs-mcp, Context7, and Neuledge Context all feed docs to an AI agent over MCP, but optimize for different things. They aren't mutually exclusive — an agent can mount all three and route by intent.
| pydocs-mcp | Context7 | Neuledge Context | |
|---|---|---|---|
| Deployment | Local stdio MCP server | Hosted MCP (mcp.context7.com) |
Local stdio MCP server |
| Doc source | Your installed Python deps + your own project, indexed in place | Curated community docs hosted by Upstash | Community registry (~100+ libraries), pulled then queried locally |
| Version match | Exactly what's in your site-packages — automatic |
Library + version chosen in the prompt | Latest from the registry |
| Languages | Python | Multi-language | Multi-language (~100+ libraries) |
| Retrieval | Keyword (BM25) + dense embeddings + LLM tree reasoning, fused via RRF or weighted scores | Not publicly documented | BM25 over SQLite FTS5 |
| Code-structure queries | Reference graph — lookup(show=callers|callees|inherits) |
None (doc retrieval only) | None (doc retrieval only) |
| Indexes your code | Yes — under the __project__ package |
No | No |
| Privacy | Fully offline with the default embedder — zero network calls | Queries hit Upstash; OAuth + API key | Local once packages are downloaded |
| Dependencies | Lean — no PyTorch, no FAISS (Rust TurboQuant store + small ONNX embedder) | Hosted service (nothing to install) | Local service |
| Cost | $0 — OSS (MIT); no keys, limits, or fees | Free tier (rate-limited) + paid plans | $0 — OSS (Apache-2.0) |
In short: choose pydocs-mcp for offline, version-matched Python retrieval where you also navigate code structure; Context7 for hosted, multi-language docs; Neuledge for a local-first multi-language registry.
Benchmarked, not hand-waved
pydocs-mcp ships a real benchmark harness that scores retrieval quality on public benchmarks (RepoQA, DS-1000) and head-to-head against Context7 and Neuledge — with confidence intervals and plots. See benchmarks/README.md.
Retrieval methods & R&D
Each method below is a named step under
python/pydocs_mcp/retrieval/steps/,
addressable from YAML. The default chunk_search.yaml composes BM25 +
single-vector dense fused via RRF; everything else is opt-in via a
preset swap (--config), with no behavioral change for default installs.
Keyword — BM25 over SQLite FTS5
Full-text search with porter stemming and the unicode61 tokenizer. Free, instant, and the baseline that every other method composes with through the fusion steps below.
Single-vector dense — FastEmbed + TurboQuant
-
Embedder. FastEmbed with BAAI/bge-small-en-v1.5 by default — runs on CPU via ONNX, no PyTorch, no torch download. OpenAI
text-embedding-3-smallis the optional alternative for users with an API key. Pass--gputo run the on-device embedders (FastEmbed /sentence_transformers) on CUDA instead — same vectors, lower latency. -
Bigger on-device model — the
sentence_transformersprovider. For stronger dense recall without an API key, switch toQwen/Qwen3-Embedding-0.6Bserved via sentence-transformers (torch). It is GPU-reliable — torch frees CUDA memory between sequential index-builds — and the weights download at runtime on first use. Install the extra (pip install 'pydocs-mcp[sentence-transformers]', ~1-5 GB with torch), then set it in your YAML:embedding: provider: sentence_transformers model_name: Qwen/Qwen3-Embedding-0.6B dim: 1024 # Optional. Token cap (attention is O(seq^2) — the OOM guard). Omit to # use the embedder's own default (2048). max_seq_length: 2048 # Optional. L2-normalize output (default true). normalize: true # Optional. Named asymmetric query prompt; omit to use the model's own. query_prompt_name: query
The default remains bge-small; the
sentence_transformersprovider is opt-in. -
Vector store. TurboQuant (turbovec) — Online Vector Quantization with near-optimal distortion. ~16× smaller than float32 (a 1536-dim vector drops from 6,144 to 384 bytes; a 10 M-doc corpus fits in 4 GB instead of 31 GB) and faster than FAISS FastScan at the same recall. Persists as a
.tqsidecar next to the SQLite DB.
Late-interaction (multi-vector / MaxSim) — opt-in
The flagship R&D backend. One vector per token instead of one pooled vector per chunk; queries score via ColBERT's MaxSim — for each query token, take the maximum cosine to any document token, then sum. Higher recall on long, structurally distant queries (often the hard cases for single-vector retrievers).
-
Method. ColBERT late interaction (Khattab & Zaharia, SIGIR 2020).
-
Engine. PLAID (Santhanam et al., CIKM 2022) via fast-plaid — a Rust-backed IVF + residual-decompression engine. Persists as a per-project directory sidecar at
~/.pydocs-mcp/{slug}.plaid/. -
Embedder. PyLate (arXiv:2508.03555) with the default model
lightonai/LateOn-Code— late-interaction trained on code. -
Lighter-weight model —
lightonai/LateOn-Code-edge. For a smaller per-token footprint, point the same PyLate path atlightonai/LateOn-Code-edge(48-dim token vectors instead of LateOn-Code's 128) in your YAML:late_interaction: enabled: true provider: pylate model_name: lightonai/LateOn-Code-edge embedding_dim: 48 document_length: 2048 query_length: 256
The default stays LateOn-Code; LateOn-Code-edge is opt-in.
-
SQLite + fast-plaid coupling. A
chunk_multi_vector_idsmapping table bridges SQLite'schunk_idto fast-plaid'splaid_doc_id. The shippedFilterAdapterProtocol pushes metadata filters down to SQLite, then the result chunk-id list is passed assubset=to fast-plaid's MaxSim search — so MaxSim is always bounded to the SQLite-eligible candidates and the two engines stay in their own id spaces. -
Enable.
pip install 'pydocs-mcp[late-interaction]', setlate_interaction.enabled: truein your YAML, then point--configat the shippedchunk_search_late_interaction.yamlpreset.
Hybrid fusion
- Reciprocal Rank Fusion (RRF) —
Cormack, Clarke & Buettcher, SIGIR 2009.
Rank-only
1 / (k + rank)withk=60default; the workhorse for combining BM25 + dense, or BM25 + late-interaction. - Weighted Score Interpolation (WSI) — score-space
α · score_a + (1 − α) · score_bwith min-max normalization, for cases where the score distributions are well-calibrated and rank isn't enough.αis tunable from YAML.
LLM tree reasoning — opt-in
A vectorless mode for broad, structural questions ("walk me through the request lifecycle"). Instead of embedding text, an LLM walks the code map — module / class titles plus short summaries — and picks the best spots itself. Inspired by PageIndex (VectifyAI)'s reasoning-over-tree-of-contents approach.
Three shipped presets under
python/pydocs_mcp/pipelines/:
tree_only.yaml, chunk_search_with_tree_reasoning_parallel.yaml
(run alongside chunk search, fuse via WSI), and
chunk_search_with_tree_reasoning_after.yaml (use chunk search as
the candidate pool, let the LLM re-rank). Provider / model /
temperature / max_tokens are tuned under the llm: section of YAML;
any OpenAI-compatible endpoint works.
Code reference graph
Beyond embeddings, pydocs-mcp captures a graph of how code
references code during indexing: CALLS, IMPORTS, INHERITS,
and optional MENTIONS (backtick-quoted dotted names in markdown).
The same surface answers an AI's "what calls this?" / "what does
this extend?" questions through the lookup(show=…) MCP tool:
pydocs-mcp lookup requests.auth.HTTPBasicAuth --show inherits
pydocs-mcp lookup my_module.Parser.parse --show callers
Capture is on by default and tunable under reference_graph: in YAML
(toggle, kinds-to-emit, output bounds).
Learn more
- DOCUMENTATION.md — how it works in depth: retrieval pipeline, reference graph, cache, configuration, database schema, and the full CLI reference.
- EXTENSIONS.md — extend it: new vector-store backends, pipeline steps, and fusion strategies.
- benchmarks/README.md — the evaluation harness.
- INSTALL.md — installation & troubleshooting.
- CLAUDE.md — architecture & contributor guide.
Sources & references
Benchmarks
- RepoQA — Evaluating Long Context Code Understanding · arXiv:2406.06025 (2024)
- DS-1000 — A Natural and Reliable Benchmark for Data Science Code Generation · arXiv:2211.11501 (2023)
- CodeRAG-Bench — Can Retrieval Augment Code Generation? · arXiv:2406.14497 (2024)
Vectors & retrieval
- TurboQuant — Online Vector Quantization with Near-optimal Distortion Rate · arXiv:2504.19874 (Google Research, 2025); implemented by
turbovec - FAISS — the similarity-search library used as the speed/storage baseline above
- FastEmbed with BAAI/bge-small-en-v1.5 — the default on-device embedder for the single-vector dense mode
- PyLate with
lightonai/LateOn-Code— the default model for the opt-in late-interaction (multi-vector / MaxSim) mode · PyLate: Flexible Training and Retrieval for Late Interaction Models · arXiv:2508.03555 (LightOn, 2025) - ColBERT — Efficient and Effective Passage Search via Contextualized Late Interaction over BERT · arXiv:2004.12832 (Khattab & Zaharia, SIGIR 2020) — the late-interaction architecture
- PLAID — An Efficient Engine for Late Interaction Retrieval · arXiv:2205.09707 (Santhanam et al., CIKM 2022) — implemented by fast-plaid, the engine pydocs-mcp uses for MaxSim scoring
- Reciprocal Rank Fusion — Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods · Cormack, Clarke & Buettcher, SIGIR 2009 — the rank-fusion baseline (k=60)
- PageIndex — inspiration for the LLM tree-reasoning mode
Protocol & comparable tools
- Model Context Protocol — the MCP standard
- Context7 · Neuledge Context
License: MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydocs_mcp-0.3.0.tar.gz.
File metadata
- Download URL: pydocs_mcp-0.3.0.tar.gz
- Upload date:
- Size: 7.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b7e54757829118762b9be8dec1ab5b10151344c4635e4aaa471a3234320dd7b
|
|
| MD5 |
885ae9952575cc33d129a1bf037226b5
|
|
| BLAKE2b-256 |
011812388d4e97c618559aab7fe02869083978c55297222cab7377a9c7f1c1de
|
File details
Details for the file pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8eeb541fdf765494c0839238d7640c551ecdcee0c22c629a06b9d5f8623f2a4f
|
|
| MD5 |
a761ece2df5af63237e152b2a0bd965f
|
|
| BLAKE2b-256 |
7573e4acd45ac2e87b59cb5d3d50ff8a351fe8290c97a9f8d9fd25db134ec3b6
|
File details
Details for the file pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62f57c3356da50c42711fd322d69f96677f813ccf9da55f7347d69337f70506b
|
|
| MD5 |
4502dfe817f6a583966d5205651b8324
|
|
| BLAKE2b-256 |
d99ddb863526a392a5965a0df997cef2d385c9fdf8b02af93a49e90204b1c27f
|
File details
Details for the file pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
045d3820bb061c44464df566e57149f3832e71ff36ad11cc16a0c6f301cd5df1
|
|
| MD5 |
9b082b6a4fb4b5903f89518701fcf7f1
|
|
| BLAKE2b-256 |
4a0f3bf502ecb25ce59acf217d0f985132e327b5448cb056292c5cbd35fd1ba5
|
File details
Details for the file pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bcc301a60768e4edec3f2383ce1bb5b32b48b4dccad4645aeb548b762ac36c0
|
|
| MD5 |
e48f80aa940e1b1413a50eceb069e1a4
|
|
| BLAKE2b-256 |
1ca89c650bbcc1574a62ac1ebc5c22bb15e02cc3f4e30166c419fda806b5101e
|