Semantic search engine for markdown documents. MCP server with multi-provider embeddings and Milvus vector storage.
Project description
Markdown-FastRAG-MCP
A semantic search engine for markdown documents. An MCP server with non-blocking background indexing, multi-provider embeddings (Gemini, OpenAI, Vertex AI, Voyage), and Milvus / Zilliz Cloud vector storage — designed for multi-agent concurrent access.
This project is a fork of Zackriya-Solutions/MCP-Markdown-RAG, heavily extended for production multi-agent use. Original project is licensed under Apache 2.0.
Ask "what are the tradeoffs of microservices?" and find your notes about service boundaries, distributed systems, and API design — even if none of them mention "microservices."
graph LR
A["Claude Code"] --> M["Milvus Standalone<br/>(Docker)"]
B["Codex"] --> M
C["Copilot"] --> M
D["Antigravity"] --> M
M --> V["Shared Document Index"]
Quick Start
pip install markdown-fastrag-mcp
Add to your MCP host config:
{
"mcpServers": {
"markdown-rag": {
"command": "uvx",
"args": ["markdown-fastrag-mcp"],
"env": {
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "${GEMINI_API_KEY}",
"MILVUS_ADDRESS": "http://localhost:19530"
}
}
}
}
Tip: Omit
MILVUS_ADDRESSfor local-only use (defaults to SQLite-based Milvus Lite).
Features
- Semantic matching — finds conceptually related content, not just keyword hits
- Multi-provider embeddings — Gemini, OpenAI, Vertex AI, Voyage, or local models
- Async background indexing — non-blocking
index_documentsreturns instantly withjob_id; poll withget_index_status - Event-loop-safe threading — all sync I/O runs in worker threads via
asyncio.to_thread - Smart incremental indexing — mtime/size fast-path skips unchanged files without reading them
- 3-way delta scan — classifies files as new/modified/deleted in one walk; new files skip Milvus delete
- Smart chunk merging — small chunks below
MIN_CHUNK_TOKENSare merged with siblings; parent header context injected - Empty chunk filtering — frontmatter-only and structural-only chunks (headers/separators with no prose) are dropped at indexing and filtered at search time
- Short chunk drop — final chunks below
MIN_FINAL_TOKENS(default 150) are dropped with per-chunk stderr logging - Reconciliation sweep — after each index run, queries all Milvus paths and deletes orphan vectors whose source files no longer exist on disk
- Search dedup — per-file result limiting prevents a single document from dominating results
- Scoped search & pruning —
scope_pathfilters results to subdirectories; pruning never wipes unrelated data - Batch embedding & insert — concurrent batches with 429 retry, chunked Milvus inserts under gRPC 64MB limit
- Shell reindex CLI —
reindex.pyfor large-scale indexing with real-time progress logs
📚 Documentation
| Document | Description |
|---|---|
| Embedding Providers | All 6 providers: setup, auth, tuning, rate limiting |
| Milvus / Zilliz Setup | Lite vs Standalone vs Zilliz Cloud, Docker Compose, troubleshooting |
| Indexing Architecture | Non-blocking flow, to_thread, 3-way delta, reconciliation sweep |
| Optimization | Chunk merging, header injection, batch insert, search dedup |
Tools
| Tool | Description |
|---|---|
index_documents |
Start background index job, returns job_id instantly |
get_index_status |
Poll job status (running / succeeded / failed) |
search_documents |
Semantic search with relevance scores and file paths |
clear_index |
Reset vector database and tracking state |
How It Works
flowchart LR
A["📁 Markdown Files"] -->|"walk + filter"| B["🔍 Delta Scan<br/>mtime/size"]
B -->|changed| C["✂️ Chunk + Merge"]
B -->|unchanged| SKIP["⏭️ Skip"]
B -->|deleted| PRUNE["🗑️ Prune"]
C --> D["🧠 Embed"]
D -->|"batch insert"| E["💾 Milvus"]
F["🔎 Query"] --> D
D -->|"k×5"| G["📊 Dedup + Top-K"]
style A fill:#2d3748,color:#e2e8f0
style D fill:#553c9a,color:#e9d8fd
style E fill:#2a4365,color:#bee3f8
style G fill:#22543d,color:#c6f6d5
style PRUNE fill:#742a2a,color:#fed7d7
Configuration
Core
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
local |
gemini, openai, openai-compatible, vertex, voyage |
EMBEDDING_DIM |
768 |
Vector dimension |
MILVUS_ADDRESS |
.db/milvus_markdown.db |
Milvus address or local file path |
MARKDOWN_WORKSPACE |
— | Lock workspace root |
Indexing
| Variable | Default | Description |
|---|---|---|
MARKDOWN_CHUNK_SIZE |
2048 |
Token chunk size |
MARKDOWN_CHUNK_OVERLAP |
100 |
Token overlap between chunks |
MIN_CHUNK_TOKENS |
300 |
Small-chunk merge threshold |
MIN_FINAL_TOKENS |
150 |
Drop final chunks below this token count |
DEDUP_MAX_PER_FILE |
1 |
Max results per file (0 = off) |
EMBEDDING_BATCH_SIZE |
250 |
Texts per API call |
EMBEDDING_CONCURRENT_BATCHES |
4 |
Parallel batches |
EMBEDDING_BATCH_DELAY_MS |
0 |
Delay (ms) between batch waves |
MILVUS_INSERT_BATCH |
5000 |
Rows per Milvus insert (gRPC 64MB limit) |
Tip: Defaults work well for most vaults. Adjust
MIN_CHUNK_TOKENS/MIN_FINAL_TOKENSif short notes are being dropped unexpectedly. Changes require a force reindex (reindex.py --force).See Embedding Providers for full auth and tuning options.
Performance
| Metric | Result |
|---|---|
| Unchanged files — hash computations | 0 (mtime/size fast-path) |
| Changed file — embed + insert | ~3 seconds |
| No changes — full scan | instant |
| Full reindex (1300 files, 23K chunks) | ~7–8 minutes |
License
Apache 2.0 — see LICENSE for full text.
This project is a fork of MCP-Markdown-RAG by Zackriya Solutions. Original project is licensed under Apache 2.0; this fork maintains the same license.
Key additions over upstream:
- Multi-provider embeddings (Gemini, Vertex AI, OpenAI, Voyage)
- Milvus vector store replacing Qdrant
- Non-blocking background indexing with
asyncio.to_thread - 3-way delta scan (new/modified/deleted)
- Smart chunk merging with parent header injection
- Empty chunk filtering (frontmatter-only / structural-only drop)
- Short chunk drop (final chunks below 150 tokens with per-chunk logging)
- Reconciliation sweep (Milvus↔disk ghost vector cleanup)
- Scoped search & pruning, batch embedding, shell CLI
- VS Code Copilot MCP compatibility (dummy params for zero-required-arg tools)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdown_fastrag_mcp-1.7.1.tar.gz.
File metadata
- Download URL: markdown_fastrag_mcp-1.7.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
555a9d9321e82b7296965b78141e59e7e2e8e44936cd764218db09fac51a031f
|
|
| MD5 |
6fd4f159cb1318c2a7bccea4c4880817
|
|
| BLAKE2b-256 |
0116a01c576ab50d4e8dc383fa98a1d6a75f78211e44a07530994e407ef3a2a7
|
File details
Details for the file markdown_fastrag_mcp-1.7.1-py3-none-any.whl.
File metadata
- Download URL: markdown_fastrag_mcp-1.7.1-py3-none-any.whl
- Upload date:
- Size: 2.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e2ad77b965e41a10dca4f3a8d39fb790f644f1967a6e3a8c190f428d4a623d
|
|
| MD5 |
4d32eb33ba40362cb071383f76e74632
|
|
| BLAKE2b-256 |
34e692e6b8bfffaeab393b6b87f48797cc1ace9feafe78e23e878c7827cca39f
|