Skip to main content

Local Python docs MCP server, accelerated with Rust

Project description

pydocs-mcp

CI PyPI Python License: MIT Ruff Checked with mypy MCP

Local, version-aware code & docs search for your AI coding agent — over the exact library versions installed on your machine.

pydocs-mcp architecture overview: your project source and installed Python libraries are indexed into a SQLite database (chunks, metadata, reference graph) plus a TurboQuant .tq vector file; an AI coding assistant's query runs through keyword (BM25) and vector search fused together — with a tree-navigating mode over the code map — then a result ranker returns version-aware answers, all locally with no API keys or network upload.

Your AI assistant thinks you're on requests 2.28. You actually have 2.31. It calls a kwarg that was renamed two versions ago, your test fails, and you lose twenty minutes. The fix isn't a smarter prompt — it's giving the AI docs that match your lockfile, not the average of every StackOverflow answer it ever read.

pydocs-mcp indexes your project plus every installed dependency, right on your machine, in seconds. Your agent connects over MCP and gets answers grounded in your code — fully offline.

What you get

  • Matched to your install. Searches the exact versions sitting in your site-packages, so your agent stops inventing APIs from some older release.
  • Private & offline. Everything runs locally — no API keys, no uploads, no rate limits, no per-query fees.
  • Three ways to find code. Keyword, meaning, and LLM reasoning (see How it works) — on their own or fused into one ranked answer.
  • Knows how your code connects. Ask "what calls this?", "what does it call?", or "what does this class inherit?" — across your project and every dependency.
  • Lean, not bloated. Minimal dependencies — no PyTorch, no FAISS. A small local ONNX embedder plus the Rust TurboQuant vector store (turbovec), which packs embeddings ~16× smaller than float32 (a 1536-dim vector drops from 6,144 to 384 bytes; a 10M-doc corpus fits in 4 GB instead of 31 GB) and benchmarks faster than FAISS FastScan. The on-disk index stays tiny and search stays quick.
  • Cheap to keep current. Edit a doc and only the changed chunks are re-embedded — partial re-ingestion, not a full rebuild — while unchanged packages are skipped in under 100 ms. A Rust core does the heavy lifting.

How it works

Three steps, all on your machine (see the diagram above):

  1. Index — pydocs-mcp scans your project and installed deps into a local SQLite database (code chunks, metadata, and a graph of how everything references everything else) plus a compact TurboQuant .tq vector file for meaning-based search. Re-running is cheap: unchanged packages are skipped, and when a file does change, only its changed chunks are re-embedded.
  2. Search — each query can use three complementary modes and fuse them into one ranked list:
    • Keyword — instant, exact matches for names, error strings, and signatures.
    • Meaning — dense embeddings find the right code even when your words differ from the docs', via a small model that runs locally.
    • Reasoning — for broad or structural questions, an LLM walks your code's map (titles + summaries, no embeddings) to pick the best spots.
  3. Answer — results flow back to your agent through two simple tools: search (find by relevance) and lookup (jump to a known name, or trace its callers, callees, and inheritance).

The only call that ever leaves your machine is the optional reasoning mode — and only if you turn it on with your own key.

Quick start

pip install -e .                  # pure Python, works everywhere
# …or with the Rust core for speed:
pip install maturin && maturin develop --release

Linux needs OpenBLAS for the vector store (macOS and Windows already ship it):

sudo apt-get install -y libopenblas-pthread-dev

Then index your project and start the server:

pydocs-mcp serve .                            # index project + deps, serve over MCP (stdio)
pydocs-mcp serve . --gpu                      # …same, with CUDA-accelerated embeddings
pydocs-mcp search "batch inference"           # the same search, from the CLI
pydocs-mcp lookup requests.auth.HTTPBasicAuth --show inherits

Embeddings run on CPU by default. Add --gpu to serve / index (or the benchmark runner) to move all embedder inference — FastEmbed, the sentence_transformers provider, and PyLate — onto CUDA. It's a latency knob only: no YAML change, no re-index, identical results. Needs the matching GPU runtime — see INSTALL.md.

Live re-indexing (optional)

If you edit code while you want the index to stay fresh, install the watch extras and pick one of two modes — both debounce edits to .py, .md, and .ipynb files into a single reindex.

pip install 'pydocs-mcp[watch]'
pydocs-mcp serve . --watch   # MCP server + watcher (for AI clients)
pydocs-mcp watch .            # watcher only (no MCP server; index stays fresh for CLI `search` / `lookup`)

Both modes share the same YAML tunables: debounce, file extensions, and ignored paths live under serve.watch.* in your pydocs-mcp.yaml (see DOCUMENTATION.md).

Point Claude Code, Cursor, or Continue.dev at it over stdio — copy-paste client configs are in DOCUMENTATION.md, and install troubleshooting (including the libopenblas fallback) is in INSTALL.md.

How it compares

pydocs-mcp, Context7, and Neuledge Context all feed docs to an AI agent over MCP, but optimize for different things. They aren't mutually exclusive — an agent can mount all three and route by intent.

pydocs-mcp Context7 Neuledge Context
Deployment Local stdio MCP server Hosted MCP (mcp.context7.com) Local stdio MCP server
Doc source Your installed Python deps + your own project, indexed in place Curated community docs hosted by Upstash Community registry (~100+ libraries), pulled then queried locally
Version match Exactly what's in your site-packages — automatic Library + version chosen in the prompt Latest from the registry
Languages Python Multi-language Multi-language (~100+ libraries)
Retrieval Keyword (BM25) + dense embeddings + LLM tree reasoning, fused via RRF or weighted scores Not publicly documented BM25 over SQLite FTS5
Code-structure queries Reference graph — lookup(show=callers|callees|inherits) None (doc retrieval only) None (doc retrieval only)
Indexes your code Yes — under the __project__ package No No
Privacy Fully offline with the default embedder — zero network calls Queries hit Upstash; OAuth + API key Local once packages are downloaded
Dependencies Lean — no PyTorch, no FAISS (Rust TurboQuant store + small ONNX embedder) Hosted service (nothing to install) Local service
Cost $0 — OSS (MIT); no keys, limits, or fees Free tier (rate-limited) + paid plans $0 — OSS (Apache-2.0)

In short: choose pydocs-mcp for offline, version-matched Python retrieval where you also navigate code structure; Context7 for hosted, multi-language docs; Neuledge for a local-first multi-language registry.

Benchmarked, not hand-waved

pydocs-mcp ships a real benchmark harness that scores retrieval quality on public benchmarks (RepoQA, DS-1000) and head-to-head against Context7 and Neuledge — with confidence intervals and plots. See benchmarks/README.md.

Retrieval methods & R&D

Each method below is a named step under python/pydocs_mcp/retrieval/steps/, addressable from YAML. The default chunk_search.yaml composes BM25 + single-vector dense fused via RRF; everything else is opt-in via a preset swap (--config), with no behavioral change for default installs.

Keyword — BM25 over SQLite FTS5

Full-text search with porter stemming and the unicode61 tokenizer. Free, instant, and the baseline that every other method composes with through the fusion steps below.

Single-vector dense — FastEmbed + TurboQuant

  • Embedder. FastEmbed with BAAI/bge-small-en-v1.5 by default — runs on CPU via ONNX, no PyTorch, no torch download. OpenAI text-embedding-3-small is the optional alternative for users with an API key. Pass --gpu to run the on-device embedders (FastEmbed / sentence_transformers) on CUDA instead — same vectors, lower latency.

  • Bigger on-device model — the sentence_transformers provider. For stronger dense recall without an API key, switch to Qwen/Qwen3-Embedding-0.6B served via sentence-transformers (torch). It is GPU-reliable — torch frees CUDA memory between sequential index-builds — and the weights download at runtime on first use. Install the extra (pip install 'pydocs-mcp[sentence-transformers]', ~1-5 GB with torch), then set it in your YAML:

    embedding:
      provider: sentence_transformers
      model_name: Qwen/Qwen3-Embedding-0.6B
      dim: 1024
      # Optional. Token cap (attention is O(seq^2) — the OOM guard). Omit to
      # use the embedder's own default (2048).
      max_seq_length: 2048
      # Optional. L2-normalize output (default true).
      normalize: true
      # Optional. Named asymmetric query prompt; omit to use the model's own.
      query_prompt_name: query
    

    The default remains bge-small; the sentence_transformers provider is opt-in.

  • Vector store. TurboQuant (turbovec) — Online Vector Quantization with near-optimal distortion. ~16× smaller than float32 (a 1536-dim vector drops from 6,144 to 384 bytes; a 10 M-doc corpus fits in 4 GB instead of 31 GB) and faster than FAISS FastScan at the same recall. Persists as a .tq sidecar next to the SQLite DB.

Late-interaction (multi-vector / MaxSim) — opt-in

The flagship R&D backend. One vector per token instead of one pooled vector per chunk; queries score via ColBERT's MaxSim — for each query token, take the maximum cosine to any document token, then sum. Higher recall on long, structurally distant queries (often the hard cases for single-vector retrievers).

  • Method. ColBERT late interaction (Khattab & Zaharia, SIGIR 2020).

  • Engine. PLAID (Santhanam et al., CIKM 2022) via fast-plaid — a Rust-backed IVF + residual-decompression engine. Persists as a per-project directory sidecar at ~/.pydocs-mcp/{slug}.plaid/.

  • Embedder. PyLate (arXiv:2508.03555) with the default model lightonai/LateOn-Code — late-interaction trained on code.

  • Lighter-weight model — lightonai/LateOn-Code-edge. For a smaller per-token footprint, point the same PyLate path at lightonai/LateOn-Code-edge (48-dim token vectors instead of LateOn-Code's 128) in your YAML:

    late_interaction:
      enabled: true
      provider: pylate
      model_name: lightonai/LateOn-Code-edge
      embedding_dim: 48
      document_length: 2048
      query_length: 256
    

    The default stays LateOn-Code; LateOn-Code-edge is opt-in.

  • SQLite + fast-plaid coupling. A chunk_multi_vector_ids mapping table bridges SQLite's chunk_id to fast-plaid's plaid_doc_id. The shipped FilterAdapter Protocol pushes metadata filters down to SQLite, then the result chunk-id list is passed as subset= to fast-plaid's MaxSim search — so MaxSim is always bounded to the SQLite-eligible candidates and the two engines stay in their own id spaces.

  • Enable. pip install 'pydocs-mcp[late-interaction]', set late_interaction.enabled: true in your YAML, then point --config at the shipped chunk_search_late_interaction.yaml preset.

Hybrid fusion

  • Reciprocal Rank Fusion (RRF)Cormack, Clarke & Buettcher, SIGIR 2009. Rank-only 1 / (k + rank) with k=60 default; the workhorse for combining BM25 + dense, or BM25 + late-interaction.
  • Weighted Score Interpolation (WSI) — score-space α · score_a + (1 − α) · score_b with min-max normalization, for cases where the score distributions are well-calibrated and rank isn't enough. α is tunable from YAML.

LLM tree reasoning — opt-in

A vectorless mode for broad, structural questions ("walk me through the request lifecycle"). Instead of embedding text, an LLM walks the code map — module / class titles plus short summaries — and picks the best spots itself. Inspired by PageIndex (VectifyAI)'s reasoning-over-tree-of-contents approach.

Three shipped presets under python/pydocs_mcp/pipelines/: tree_only.yaml, chunk_search_with_tree_reasoning_parallel.yaml (run alongside chunk search, fuse via WSI), and chunk_search_with_tree_reasoning_after.yaml (use chunk search as the candidate pool, let the LLM re-rank). Provider / model / temperature / max_tokens are tuned under the llm: section of YAML; any OpenAI-compatible endpoint works.

Code reference graph

Beyond embeddings, pydocs-mcp captures a graph of how code references code during indexing: CALLS, IMPORTS, INHERITS, and optional MENTIONS (backtick-quoted dotted names in markdown). The same surface answers an AI's "what calls this?" / "what does this extend?" questions through the lookup(show=…) MCP tool:

pydocs-mcp lookup requests.auth.HTTPBasicAuth --show inherits
pydocs-mcp lookup my_module.Parser.parse --show callers

Capture is on by default and tunable under reference_graph: in YAML (toggle, kinds-to-emit, output bounds).

Learn more

  • DOCUMENTATION.md — how it works in depth: retrieval pipeline, reference graph, cache, configuration, database schema, and the full CLI reference.
  • EXTENSIONS.md — extend it: new vector-store backends, pipeline steps, and fusion strategies.
  • benchmarks/README.md — the evaluation harness.
  • INSTALL.md — installation & troubleshooting.
  • CLAUDE.md — architecture & contributor guide.

Sources & references

Benchmarks

  • RepoQA — Evaluating Long Context Code Understanding · arXiv:2406.06025 (2024)
  • DS-1000 — A Natural and Reliable Benchmark for Data Science Code Generation · arXiv:2211.11501 (2023)
  • CodeRAG-Bench — Can Retrieval Augment Code Generation? · arXiv:2406.14497 (2024)

Vectors & retrieval

  • TurboQuant — Online Vector Quantization with Near-optimal Distortion Rate · arXiv:2504.19874 (Google Research, 2025); implemented by turbovec
  • FAISS — the similarity-search library used as the speed/storage baseline above
  • FastEmbed with BAAI/bge-small-en-v1.5 — the default on-device embedder for the single-vector dense mode
  • PyLate with lightonai/LateOn-Code — the default model for the opt-in late-interaction (multi-vector / MaxSim) mode · PyLate: Flexible Training and Retrieval for Late Interaction Models · arXiv:2508.03555 (LightOn, 2025)
  • ColBERT — Efficient and Effective Passage Search via Contextualized Late Interaction over BERT · arXiv:2004.12832 (Khattab & Zaharia, SIGIR 2020) — the late-interaction architecture
  • PLAID — An Efficient Engine for Late Interaction Retrieval · arXiv:2205.09707 (Santhanam et al., CIKM 2022) — implemented by fast-plaid, the engine pydocs-mcp uses for MaxSim scoring
  • Reciprocal Rank Fusion — Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods · Cormack, Clarke & Buettcher, SIGIR 2009 — the rank-fusion baseline (k=60)
  • PageIndex — inspiration for the LLM tree-reasoning mode

Protocol & comparable tools

License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydocs_mcp-0.3.0.tar.gz (7.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11+Windows x86-64

pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file pydocs_mcp-0.3.0.tar.gz.

File metadata

  • Download URL: pydocs_mcp-0.3.0.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pydocs_mcp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6b7e54757829118762b9be8dec1ab5b10151344c4635e4aaa471a3234320dd7b
MD5 885ae9952575cc33d129a1bf037226b5
BLAKE2b-256 011812388d4e97c618559aab7fe02869083978c55297222cab7377a9c7f1c1de

See more details on using hashes here.

File details

Details for the file pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pydocs_mcp-0.3.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8eeb541fdf765494c0839238d7640c551ecdcee0c22c629a06b9d5f8623f2a4f
MD5 a761ece2df5af63237e152b2a0bd965f
BLAKE2b-256 7573e4acd45ac2e87b59cb5d3d50ff8a351fe8290c97a9f8d9fd25db134ec3b6

See more details on using hashes here.

File details

Details for the file pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

  • Download URL: pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62f57c3356da50c42711fd322d69f96677f813ccf9da55f7347d69337f70506b
MD5 4502dfe817f6a583966d5205651b8324
BLAKE2b-256 d99ddb863526a392a5965a0df997cef2d385c9fdf8b02af93a49e90204b1c27f

See more details on using hashes here.

File details

Details for the file pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

  • Download URL: pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.11+, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pydocs_mcp-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 045d3820bb061c44464df566e57149f3832e71ff36ad11cc16a0c6f301cd5df1
MD5 9b082b6a4fb4b5903f89518701fcf7f1
BLAKE2b-256 4a0f3bf502ecb25ce59acf217d0f985132e327b5448cb056292c5cbd35fd1ba5

See more details on using hashes here.

File details

Details for the file pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.11+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pydocs_mcp-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8bcc301a60768e4edec3f2383ce1bb5b32b48b4dccad4645aeb548b762ac36c0
MD5 e48f80aa940e1b1413a50eceb069e1a4
BLAKE2b-256 1ca89c650bbcc1574a62ac1ebc5c22bb15e02cc3f4e30166c419fda806b5101e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page