Give any local Markdown folder a semantic-search MCP server

These details have not been verified by PyPI

Project links

Project description

mdrag

Give any local Markdown folder a semantic-search MCP server. Runs entirely offline.

Turn ~/Desktop/sales/, ~/Desktop/notes/, or any directory full of Markdown files into a searchable knowledge base that Claude Code, Cursor, Cline, and other MCP clients can query with natural-language questions.

Features

Storage & indexing

🗂 Multi-vault — one MCP server manages many doc folders, each a separate "vault"
📦 Self-contained — each vault's vector DB lives inside the folder (.mdrag/), move it anywhere
⚡ Incremental indexing — only re-embed files whose mtime changed
👀 Auto-reindex on save — mdrag serve watches every registered vault with watchdog, 1.5s debounce; new/edited/deleted/moved .md files are picked up with no manual reindex, no cron
🙈 .mdragignore — gitignore-style file at the vault root excludes drafts, archives, or whole directories from the index

Retrieval quality

✂️ Chunk-level retrieval — long docs are split by headings (sliding-window fallback at 600 chars / 80 overlap) so mid-doc content stays findable; each doc also gets an "overview" chunk for broad queries
🔀 Hybrid search — dense vector retrieval fused with BM25 keyword matching via best-rank fusion, so specific terms and semantic intent both get through
🎯 Rare-term boost — queries containing digit strings (e.g. "38 种字段") switch to a BM25-priority fusion so exact-match lookups aren't buried by vector results
🌐 Cross-lingual query expansion — comparison-style queries ("区别", "对比", "compare", "vs") get auto-expanded with bilingual synonyms before embedding, improving recall on mixed-language corpora
🧠 Any embedding model — default is multilingual paraphrase-multilingual-MiniLM-L12-v2 (handles Chinese + English + 50 more); swap in any sentence-transformers model

Stability

🔐 File-locking — concurrent CLI + watcher reindexes on the same vault are serialized via filelock, preventing LanceDB corruption
📋 Schema versioning — meta.json in each .mdrag/ dir tracks schema version and model; mismatches are caught early with an actionable error
🩺 mdrag doctor — one command to check everything: Python, registry, per-vault health, model cache, disk usage, PATH; paste the output into bug reports
📡 Watcher health in MCP — list_vaults shows a ⚠️ if a vault's auto-reindex is failing (consecutive errors + message), instead of silently serving stale data

Interface

🔒 Fully local — no API keys, no cloud; embeddings run on your machine
🛠 MCP tools — list_vaults, search, get_doc, list_tags exposed to Claude Code / Cursor / Cline over stdio
💡 Match explainability — each search result includes match_reason ("vector+bm25", "bm25 (rare-term)", "bm25 only", "vector only") so AI clients can explain or re-rank
📏 Quality eval harness — mdrag eval compares any set of indexes on a YAML query suite; Recall@K, MRR, per-query ranking diff
🏷 Frontmatter-aware — title, tags, summary from YAML frontmatter are indexed and searchable

Installation

pip install mdrag

Requires Python ≥ 3.10.

Quickstart (3 steps)

Let's say Bob has a folder ~/Desktop/sales/ full of meeting notes, proposals, and competitor research in Markdown.

1. Register the MCP server (once, globally)

claude mcp add mdrag --scope user -- mdrag serve

This tells Claude Code "there's an MCP server called mdrag — launch it with mdrag serve when needed". You'll only do this once per machine.

2. Register your doc folder as a vault

mdrag vault add sales ~/Desktop/sales

The first time you run this, a ~100MB embedding model downloads (once), then all .md files under ~/Desktop/sales/ get indexed. A .mdrag/ subfolder is created inside sales/ to hold the vector database.

3. Use it from Claude Code

Open Claude Code in any project. Ask:

"Use the mdrag MCP to search my sales vault for the Q4 pipeline review"

Claude will call mcp__mdrag__search(vault="sales", query="Q4 pipeline review") and return the top matching documents.

Adding another folder

No new MCP config needed — just register another vault:

mdrag vault add marketing ~/Desktop/marketing
mdrag vault add notes ~/Documents/notes

All vaults are visible through the same MCP server. Claude calls:

mcp__mdrag__list_vaults()                          → see all vaults
mcp__mdrag__search(vault="marketing", query="...")
mcp__mdrag__search(vault="notes", query="...")

CLI reference

mdrag serve                          Start the MCP stdio server
mdrag vault add NAME PATH            Register a directory and index it
mdrag vault list                     Show all vaults
mdrag vault info NAME                Show vault details
mdrag vault reindex NAME [--full]    Re-index (incremental or full)
mdrag vault remove NAME [--purge]    Unregister (and optionally delete .mdrag/)
mdrag eval QUERIES INDEX_SPECS...    Compare retrieval quality across indexes

Common options:

--model MODEL_NAME on vault add — pick a different embedding model
--no-index on vault add — skip initial indexing (useful when first adding, want to index later)
--full on vault reindex — rebuild from scratch (required after changing the model)

MCP tools exposed

When mdrag serve is running, these tools are available to the AI client:

Tool	Purpose
`list_vaults()`	List all registered vaults with their stats
`search(vault, query, top_k=5, tags="")`	Semantic search within a vault; returns the best-matching chunk per doc with `heading_path` and `chunk_text`
`get_doc(vault, path)`	Read the full content of a document
`list_tags(vault)`	List all frontmatter tags in a vault with counts

Frontmatter (optional)

If your Markdown files have YAML frontmatter, mdrag will use it:

---
title: Q4 Pipeline Review
tags: [sales, forecast, 2026-q4]
summary: Overview of deals in play for Q4 2026.
---

# Q4 Pipeline Review
...

title — used as the result title (falls back to filename)
tags — searchable via the tags parameter of search
summary — shown in search results

No frontmatter? It still works — mdrag auto-generates a preview from the file body.

Embedding models

Language	Recommended model	Notes
Multilingual (default)	`paraphrase-multilingual-MiniLM-L12-v2`	~120MB, handles Chinese + English + 50 more
Chinese-only	`BAAI/bge-small-zh-v1.5`	~100MB, higher recall on pure Chinese
English-only	`BAAI/bge-small-en-v1.5`	~100MB, higher recall on pure English
Higher accuracy	`BAAI/bge-base-zh-v1.5` or `-en`	~400MB, noticeably slower

Change the model when registering a vault:

mdrag vault add notes ~/Documents/notes --model BAAI/bge-small-en-v1.5

After changing the model on an existing vault (edit ~/.mdrag/vaults.yaml), run a full rebuild:

mdrag vault reindex notes --full

How it works

 ┌────────────────────┐        ┌──────────────────────┐
 │ ~/Desktop/sales/   │        │ ~/.mdrag/         │
 │   meeting-01.md    │        │   vaults.yaml        │  ← registry
 │   proposal.md      │        └──────────────────────┘
 │   .mdrag/       │ ← LanceDB vector store (per-vault)
 │     docs.lance/    │
 └──────────┬─────────┘
            │
            │ mdrag serve
            ▼
 ┌──────────────────────────┐
 │   FastMCP stdio server   │
 │   tools:                 │
 │     search / get_doc /   │
 │     list_vaults /        │
 │     list_tags            │
 └──────────┬───────────────┘
            │ MCP protocol (stdio / JSON-RPC)
            ▼
     Claude Code / Cursor / Cline

Vault registry is at ~/.mdrag/vaults.yaml
Each vault's vector database lives inside the vault directory at .mdrag/ — self-contained, portable
Embeddings use sentence-transformers, stored in LanceDB
MCP server is built on FastMCP

FAQ

How do I update the index after editing files?

You don't have to. When mdrag serve is running (i.e. Claude Code / Cursor are connected), it watches every registered vault and auto-reindexes on save. A short debounce batches rapid edits.

If serve isn't running, run manual incremental:

mdrag vault reindex sales

Only files with changed mtime are re-embedded.

How do I exclude files from the index?

Put a .mdragignore file at the root of your vault, using gitignore syntax:

# Example: drafts, archives, big log exports
drafts/
archive/**
**/sales-log-*.md

Takes effect on the next index run (auto-watch picks up the change too).

Does it support PDF, DOCX, PPTX, XLSX, etc.?

Not directly — mdrag only indexes .md. This is by design: conversion is a messy, format-specific problem, and keeping the core focused on Markdown keeps the index predictable. The recommended workflow is to convert once, commit the .md output, and let mdrag watch it:

# One-off
pandoc meeting.docx -o docs/meeting.md
pandoc slides.pptx  -o docs/slides.md --extract-media=docs/_media

# Bulk conversion with Docling (best quality for PDF/PPTX)
pip install docling
docling raw/*.pdf --to markdown --output docs/

# CSV → MD table
python -c "import csv,sys; [print('|'+'|'.join(r)+'|') for r in csv.reader(open(sys.argv[1]))]" data.csv > docs/data.md

Important: strip inline base64 images before indexing. A data:image/...;base64,... payload can inflate a .md file to multi-MB and break chunking. With pandoc use --extract-media=<dir> or post-process with sed -E 's/!\[[^]]*\]\(data:image[^)]*\)//g'.

Model download is slow / fails

If you're in China, set a HuggingFace mirror:

export HF_ENDPOINT=https://hf-mirror.com
mdrag vault add sales ~/Desktop/sales

Where is the vector data stored?

Vault registry: ~/.mdrag/vaults.yaml
Each vault's vectors: <vault_path>/.mdrag/docs.lance/

Can I share a vault across machines?

Yes — the .mdrag/ folder is self-contained. Sync the whole vault directory (via Dropbox, rsync, git-lfs, whatever) and mdrag vault add <name> <path> on the other machine. No re-indexing needed as long as the embedding model matches.

Integrations

Claude Code

claude mcp add mdrag --scope user -- mdrag serve

Or manually in ~/.mcp.json:

{
  "mcpServers": {
    "mdrag": {
      "command": "mdrag",
      "args": ["serve"]
    }
  }
}

Cursor / Cline / other MCP clients

Add the same stdio command to your client's MCP configuration. The command is mdrag serve — it communicates over stdio following the MCP protocol.

Development

git clone https://github.com/andyleimc-source/mdrag
cd mdrag
python -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/pytest

Try the example vault shipped in the repo:

mdrag vault add demo ./examples/sample-vault
mdrag vault list

License

MIT — do whatever you want with it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

Apr 29, 2026

0.3.1

Apr 16, 2026

This version

0.3.0

Apr 16, 2026

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdrag-0.3.0.tar.gz (45.8 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mdrag-0.3.0-py3-none-any.whl (28.1 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file mdrag-0.3.0.tar.gz.

File metadata

Download URL: mdrag-0.3.0.tar.gz
Upload date: Apr 16, 2026
Size: 45.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdrag-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`c101be5f9f863f222299355f6facf6a8a91c2cc025f47f11d2d059a89b80b660`
MD5	`114de542cfa9a67341ae9b79ab23fbdc`
BLAKE2b-256	`a81c4c884befb7553d67d6f67d500b252103f1fe6f9b65b1d1a11530f34411fe`

See more details on using hashes here.

File details

Details for the file mdrag-0.3.0-py3-none-any.whl.

File metadata

Download URL: mdrag-0.3.0-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 28.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdrag-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`446026e4f4159ff3a931f66989d531b087073f76da13ccc6abc46d1eb0d6ba30`
MD5	`0484c641743b813a8791238307f5c67d`
BLAKE2b-256	`a65229a79c820c2734c5c3149e1bf2784105ff239a7dc28efed0f66e70abfb2c`

See more details on using hashes here.

mdrag 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mdrag

Features

Storage & indexing

Retrieval quality

Stability

Interface

Installation

Quickstart (3 steps)

1. Register the MCP server (once, globally)

2. Register your doc folder as a vault

3. Use it from Claude Code

Adding another folder

CLI reference

MCP tools exposed

Frontmatter (optional)

Embedding models

How it works

FAQ

How do I update the index after editing files?

How do I exclude files from the index?

Does it support PDF, DOCX, PPTX, XLSX, etc.?

Model download is slow / fails

Where is the vector data stored?

Can I share a vault across machines?

Integrations

Claude Code

Cursor / Cline / other MCP clients

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes