Skip to main content

Give any local Markdown folder a semantic-search MCP server

Project description

mdrag

Give any local Markdown folder a semantic-search MCP server. Runs entirely offline.

Turn ~/Desktop/sales/, ~/Desktop/notes/, or any directory full of Markdown files into a searchable knowledge base that Claude Code, Cursor, Cline, and other MCP clients can query with natural-language questions.

  • ๐Ÿ—‚ Multi-vault: one MCP server manages many doc folders, each a separate "vault"
  • ๐Ÿ”’ Fully local: no API keys, no cloud โ€” embeddings run on your machine
  • โšก Incremental indexing: only re-embed files that changed
  • ๐Ÿง  Any embedding model: default is Chinese-optimized bge-small-zh-v1.5; English / multilingual models work too
  • ๐Ÿ“ฆ Self-contained: each vault's vector DB lives inside the folder (.mdrag/), move it anywhere

Installation

pip install mdrag

Requires Python โ‰ฅ 3.10.


Quickstart (3 steps)

Let's say Bob has a folder ~/Desktop/sales/ full of meeting notes, proposals, and competitor research in Markdown.

1. Register the MCP server (once, globally)

claude mcp add mdrag --scope user -- mdrag serve

This tells Claude Code "there's an MCP server called mdrag โ€” launch it with mdrag serve when needed". You'll only do this once per machine.

2. Register your doc folder as a vault

mdrag vault add sales ~/Desktop/sales

The first time you run this, a ~100MB embedding model downloads (once), then all .md files under ~/Desktop/sales/ get indexed. A .mdrag/ subfolder is created inside sales/ to hold the vector database.

3. Use it from Claude Code

Open Claude Code in any project. Ask:

"Use the mdrag MCP to search my sales vault for the Q4 pipeline review"

Claude will call mcp__mdrag__search(vault="sales", query="Q4 pipeline review") and return the top matching documents.


Adding another folder

No new MCP config needed โ€” just register another vault:

mdrag vault add marketing ~/Desktop/marketing
mdrag vault add notes ~/Documents/notes

All vaults are visible through the same MCP server. Claude calls:

mcp__mdrag__list_vaults()                          โ†’ see all vaults
mcp__mdrag__search(vault="marketing", query="...")
mcp__mdrag__search(vault="notes", query="...")

CLI reference

mdrag serve                          Start the MCP stdio server
mdrag vault add NAME PATH            Register a directory and index it
mdrag vault list                     Show all vaults
mdrag vault info NAME                Show vault details
mdrag vault reindex NAME [--full]    Re-index (incremental or full)
mdrag vault remove NAME [--purge]    Unregister (and optionally delete .mdrag/)

Common options:

  • --model MODEL_NAME on vault add โ€” pick a different embedding model
  • --no-index on vault add โ€” skip initial indexing (useful when first adding, want to index later)
  • --full on vault reindex โ€” rebuild from scratch (required after changing the model)

MCP tools exposed

When mdrag serve is running, these tools are available to the AI client:

Tool Purpose
list_vaults() List all registered vaults with their stats
search(vault, query, top_k=5, tags="") Semantic search within a vault, optional tag filter
get_doc(vault, path) Read the full content of a document
list_tags(vault) List all frontmatter tags in a vault with counts

Frontmatter (optional)

If your Markdown files have YAML frontmatter, mdrag will use it:

---
title: Q4 Pipeline Review
tags: [sales, forecast, 2026-q4]
summary: Overview of deals in play for Q4 2026.
---

# Q4 Pipeline Review
...
  • title โ€” used as the result title (falls back to filename)
  • tags โ€” searchable via the tags parameter of search
  • summary โ€” shown in search results

No frontmatter? It still works โ€” mdrag auto-generates a preview from the file body.


Embedding models

Language Recommended model Notes
Chinese BAAI/bge-small-zh-v1.5 (default) ~100MB, CPU-friendly
English BAAI/bge-small-en-v1.5 Same family, English
Multilingual paraphrase-multilingual-MiniLM-L12-v2 For mixed-language vaults
Higher accuracy BAAI/bge-base-zh-v1.5 or -en ~400MB, noticeably slower

Change the model when registering a vault:

mdrag vault add notes ~/Documents/notes --model BAAI/bge-small-en-v1.5

After changing the model on an existing vault (edit ~/.mdrag/vaults.yaml), run a full rebuild:

mdrag vault reindex notes --full

How it works

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ ~/Desktop/sales/   โ”‚        โ”‚ ~/.mdrag/         โ”‚
 โ”‚   meeting-01.md    โ”‚        โ”‚   vaults.yaml        โ”‚  โ† registry
 โ”‚   proposal.md      โ”‚        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 โ”‚   .mdrag/       โ”‚ โ† LanceDB vector store (per-vault)
 โ”‚     docs.lance/    โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ”‚
            โ”‚ mdrag serve
            โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚   FastMCP stdio server   โ”‚
 โ”‚   tools:                 โ”‚
 โ”‚     search / get_doc /   โ”‚
 โ”‚     list_vaults /        โ”‚
 โ”‚     list_tags            โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ”‚ MCP protocol (stdio / JSON-RPC)
            โ–ผ
     Claude Code / Cursor / Cline
  • Vault registry is at ~/.mdrag/vaults.yaml
  • Each vault's vector database lives inside the vault directory at .mdrag/ โ€” self-contained, portable
  • Embeddings use sentence-transformers, stored in LanceDB
  • MCP server is built on FastMCP

FAQ

How do I update the index after editing files?

mdrag vault reindex sales

It's incremental โ€” only files with changed mtime are re-embedded.

Can I automate re-indexing?

Yes. Add to cron (Linux/macOS):

0 * * * * /path/to/mdrag vault reindex sales

Or use launchd on macOS / Task Scheduler on Windows.

Does it support PDF, DOCX, etc.?

Not yet. Convert to Markdown first (e.g. with pandoc) and point mdrag at the result.

Model download is slow / fails

If you're in China, set a HuggingFace mirror:

export HF_ENDPOINT=https://hf-mirror.com
mdrag vault add sales ~/Desktop/sales

Where is the vector data stored?

  • Vault registry: ~/.mdrag/vaults.yaml
  • Each vault's vectors: <vault_path>/.mdrag/docs.lance/

Can I share a vault across machines?

Yes โ€” the .mdrag/ folder is self-contained. Sync the whole vault directory (via Dropbox, rsync, git-lfs, whatever) and mdrag vault add <name> <path> on the other machine. No re-indexing needed as long as the embedding model matches.


Integrations

Claude Code

claude mcp add mdrag --scope user -- mdrag serve

Or manually in ~/.mcp.json:

{
  "mcpServers": {
    "mdrag": {
      "command": "mdrag",
      "args": ["serve"]
    }
  }
}

Cursor / Cline / other MCP clients

Add the same stdio command to your client's MCP configuration. The command is mdrag serve โ€” it communicates over stdio following the MCP protocol.


Development

git clone https://github.com/andyleimc-source/mdrag
cd mdrag
python -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/pytest

Try the example vault shipped in the repo:

mdrag vault add demo ./examples/sample-vault
mdrag vault list

License

MIT โ€” do whatever you want with it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdrag-0.1.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdrag-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file mdrag-0.1.0.tar.gz.

File metadata

  • Download URL: mdrag-0.1.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mdrag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97065bb0cf611c1a5f8d7aef403ac035b0f28404fc61a9cdf798a8c4ab7c5675
MD5 bd7cfc9884072e0b0fed2a6574ac5618
BLAKE2b-256 497169ff31f74c76eedbbd1a2e1ef2d5b09e636244d735dd58c44602bada2bc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for mdrag-0.1.0.tar.gz:

Publisher: publish.yml on andyleimc-source/mdrag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mdrag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mdrag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mdrag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7153dba90ba59d5d3befc007e97701888efbc89c68084c4eb33dd0de7fe3a511
MD5 b2b317e5b8a18fad3b9d8c9e5c93f3de
BLAKE2b-256 3ff4c4e100246e8905956970cbfff417338222eacc0f0e19b7577a0ece2e1f3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mdrag-0.1.0-py3-none-any.whl:

Publisher: publish.yml on andyleimc-source/mdrag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page