Give any local Markdown folder a semantic-search MCP server
Project description
mdrag
Give any local Markdown folder a semantic-search MCP server. Runs entirely offline.
Turn ~/Desktop/sales/, ~/Desktop/notes/, or any directory full of Markdown files into a searchable knowledge base that Claude Code, Cursor, Cline, and other MCP clients can query with natural-language questions.
- ๐ Multi-vault: one MCP server manages many doc folders, each a separate "vault"
- ๐ Fully local: no API keys, no cloud โ embeddings run on your machine
- โก Incremental indexing: only re-embed files that changed
- ๐ง Any embedding model: default is Chinese-optimized
bge-small-zh-v1.5; English / multilingual models work too - ๐ฆ Self-contained: each vault's vector DB lives inside the folder (
.mdrag/), move it anywhere
Installation
pip install mdrag
Requires Python โฅ 3.10.
Quickstart (3 steps)
Let's say Bob has a folder ~/Desktop/sales/ full of meeting notes, proposals, and competitor research in Markdown.
1. Register the MCP server (once, globally)
claude mcp add mdrag --scope user -- mdrag serve
This tells Claude Code "there's an MCP server called mdrag โ launch it with mdrag serve when needed". You'll only do this once per machine.
2. Register your doc folder as a vault
mdrag vault add sales ~/Desktop/sales
The first time you run this, a ~100MB embedding model downloads (once), then all .md files under ~/Desktop/sales/ get indexed. A .mdrag/ subfolder is created inside sales/ to hold the vector database.
3. Use it from Claude Code
Open Claude Code in any project. Ask:
"Use the mdrag MCP to search my sales vault for the Q4 pipeline review"
Claude will call mcp__mdrag__search(vault="sales", query="Q4 pipeline review") and return the top matching documents.
Adding another folder
No new MCP config needed โ just register another vault:
mdrag vault add marketing ~/Desktop/marketing
mdrag vault add notes ~/Documents/notes
All vaults are visible through the same MCP server. Claude calls:
mcp__mdrag__list_vaults() โ see all vaults
mcp__mdrag__search(vault="marketing", query="...")
mcp__mdrag__search(vault="notes", query="...")
CLI reference
mdrag serve Start the MCP stdio server
mdrag vault add NAME PATH Register a directory and index it
mdrag vault list Show all vaults
mdrag vault info NAME Show vault details
mdrag vault reindex NAME [--full] Re-index (incremental or full)
mdrag vault remove NAME [--purge] Unregister (and optionally delete .mdrag/)
Common options:
--model MODEL_NAMEonvault addโ pick a different embedding model--no-indexonvault addโ skip initial indexing (useful when first adding, want to index later)--fullonvault reindexโ rebuild from scratch (required after changing the model)
MCP tools exposed
When mdrag serve is running, these tools are available to the AI client:
| Tool | Purpose |
|---|---|
list_vaults() |
List all registered vaults with their stats |
search(vault, query, top_k=5, tags="") |
Semantic search within a vault, optional tag filter |
get_doc(vault, path) |
Read the full content of a document |
list_tags(vault) |
List all frontmatter tags in a vault with counts |
Frontmatter (optional)
If your Markdown files have YAML frontmatter, mdrag will use it:
---
title: Q4 Pipeline Review
tags: [sales, forecast, 2026-q4]
summary: Overview of deals in play for Q4 2026.
---
# Q4 Pipeline Review
...
titleโ used as the result title (falls back to filename)tagsโ searchable via thetagsparameter ofsearchsummaryโ shown in search results
No frontmatter? It still works โ mdrag auto-generates a preview from the file body.
Embedding models
| Language | Recommended model | Notes |
|---|---|---|
| Chinese | BAAI/bge-small-zh-v1.5 (default) |
~100MB, CPU-friendly |
| English | BAAI/bge-small-en-v1.5 |
Same family, English |
| Multilingual | paraphrase-multilingual-MiniLM-L12-v2 |
For mixed-language vaults |
| Higher accuracy | BAAI/bge-base-zh-v1.5 or -en |
~400MB, noticeably slower |
Change the model when registering a vault:
mdrag vault add notes ~/Documents/notes --model BAAI/bge-small-en-v1.5
After changing the model on an existing vault (edit ~/.mdrag/vaults.yaml), run a full rebuild:
mdrag vault reindex notes --full
How it works
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ ~/Desktop/sales/ โ โ ~/.mdrag/ โ
โ meeting-01.md โ โ vaults.yaml โ โ registry
โ proposal.md โ โโโโโโโโโโโโโโโโโโโโโโโโ
โ .mdrag/ โ โ LanceDB vector store (per-vault)
โ docs.lance/ โ
โโโโโโโโโโโโฌโโโโโโโโโโ
โ
โ mdrag serve
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastMCP stdio server โ
โ tools: โ
โ search / get_doc / โ
โ list_vaults / โ
โ list_tags โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ MCP protocol (stdio / JSON-RPC)
โผ
Claude Code / Cursor / Cline
- Vault registry is at
~/.mdrag/vaults.yaml - Each vault's vector database lives inside the vault directory at
.mdrag/โ self-contained, portable - Embeddings use
sentence-transformers, stored in LanceDB - MCP server is built on FastMCP
FAQ
How do I update the index after editing files?
mdrag vault reindex sales
It's incremental โ only files with changed mtime are re-embedded.
Can I automate re-indexing?
Yes. Add to cron (Linux/macOS):
0 * * * * /path/to/mdrag vault reindex sales
Or use launchd on macOS / Task Scheduler on Windows.
Does it support PDF, DOCX, etc.?
Not yet. Convert to Markdown first (e.g. with pandoc) and point mdrag at the result.
Model download is slow / fails
If you're in China, set a HuggingFace mirror:
export HF_ENDPOINT=https://hf-mirror.com
mdrag vault add sales ~/Desktop/sales
Where is the vector data stored?
- Vault registry:
~/.mdrag/vaults.yaml - Each vault's vectors:
<vault_path>/.mdrag/docs.lance/
Can I share a vault across machines?
Yes โ the .mdrag/ folder is self-contained. Sync the whole vault directory (via Dropbox, rsync, git-lfs, whatever) and mdrag vault add <name> <path> on the other machine. No re-indexing needed as long as the embedding model matches.
Integrations
Claude Code
claude mcp add mdrag --scope user -- mdrag serve
Or manually in ~/.mcp.json:
{
"mcpServers": {
"mdrag": {
"command": "mdrag",
"args": ["serve"]
}
}
}
Cursor / Cline / other MCP clients
Add the same stdio command to your client's MCP configuration. The command is mdrag serve โ it communicates over stdio following the MCP protocol.
Development
git clone https://github.com/andyleimc-source/mdrag
cd mdrag
python -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/pytest
Try the example vault shipped in the repo:
mdrag vault add demo ./examples/sample-vault
mdrag vault list
License
MIT โ do whatever you want with it.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdrag-0.1.0.tar.gz.
File metadata
- Download URL: mdrag-0.1.0.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97065bb0cf611c1a5f8d7aef403ac035b0f28404fc61a9cdf798a8c4ab7c5675
|
|
| MD5 |
bd7cfc9884072e0b0fed2a6574ac5618
|
|
| BLAKE2b-256 |
497169ff31f74c76eedbbd1a2e1ef2d5b09e636244d735dd58c44602bada2bc6
|
Provenance
The following attestation bundles were made for mdrag-0.1.0.tar.gz:
Publisher:
publish.yml on andyleimc-source/mdrag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mdrag-0.1.0.tar.gz -
Subject digest:
97065bb0cf611c1a5f8d7aef403ac035b0f28404fc61a9cdf798a8c4ab7c5675 - Sigstore transparency entry: 1296148570
- Sigstore integration time:
-
Permalink:
andyleimc-source/mdrag@06903247b490a717cca52c1098cb5e3f1944965f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/andyleimc-source
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06903247b490a717cca52c1098cb5e3f1944965f -
Trigger Event:
release
-
Statement type:
File details
Details for the file mdrag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mdrag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7153dba90ba59d5d3befc007e97701888efbc89c68084c4eb33dd0de7fe3a511
|
|
| MD5 |
b2b317e5b8a18fad3b9d8c9e5c93f3de
|
|
| BLAKE2b-256 |
3ff4c4e100246e8905956970cbfff417338222eacc0f0e19b7577a0ece2e1f3c
|
Provenance
The following attestation bundles were made for mdrag-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on andyleimc-source/mdrag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mdrag-0.1.0-py3-none-any.whl -
Subject digest:
7153dba90ba59d5d3befc007e97701888efbc89c68084c4eb33dd0de7fe3a511 - Sigstore transparency entry: 1296148686
- Sigstore integration time:
-
Permalink:
andyleimc-source/mdrag@06903247b490a717cca52c1098cb5e3f1944965f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/andyleimc-source
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06903247b490a717cca52c1098cb5e3f1944965f -
Trigger Event:
release
-
Statement type: