Skip to main content

Semantic search engine for markdown documents. MCP server with multi-provider embeddings and Milvus vector storage.

Project description

Markdown-FastRAG-MCP

PyPI version PyPI downloads License: Apache 2.0 MCP Server Python

A semantic search engine for markdown documents. An MCP server with non-blocking background indexing, multi-provider embeddings (Gemini, OpenAI, Vertex AI, Voyage), and Milvus / Zilliz Cloud vector storage — designed for multi-agent concurrent access.

This project is a fork of Zackriya-Solutions/MCP-Markdown-RAG, heavily extended for production multi-agent use. Original project is licensed under Apache 2.0.

Ask "what are the tradeoffs of microservices?" and find your notes about service boundaries, distributed systems, and API design — even if none of them mention "microservices."

graph LR
    A["Claude Code"] --> M["Milvus Standalone<br/>(Docker)"]
    B["Codex"] --> M
    C["Copilot"] --> M
    D["Antigravity"] --> M
    M --> V["Shared Document Index"]

Quick Start

pip install markdown-fastrag-mcp

Add to your MCP host config:

{
  "mcpServers": {
    "markdown-rag": {
      "command": "uvx",
      "args": ["markdown-fastrag-mcp"],
      "env": {
        "EMBEDDING_PROVIDER": "gemini",
        "GEMINI_API_KEY": "${GEMINI_API_KEY}",
        "MILVUS_ADDRESS": "http://localhost:19530"
      }
    }
  }
}

Tip: Omit MILVUS_ADDRESS for local-only use (defaults to SQLite-based Milvus Lite).

Features

  • Semantic matching — finds conceptually related content, not just keyword hits
  • Multi-provider embeddings — Gemini, OpenAI, Vertex AI, Voyage, or local models
  • Async background indexing — non-blocking index_documents returns instantly with job_id; poll with get_index_status
  • Event-loop-safe threading — all sync I/O runs in worker threads via asyncio.to_thread
  • Smart incremental indexing — mtime/size fast-path skips unchanged files without reading them
  • 3-way delta scan — classifies files as new/modified/deleted in one walk; new files skip Milvus delete
  • Smart chunk merging — small chunks below MIN_CHUNK_TOKENS are merged with siblings; parent header context injected
  • Empty chunk filtering — frontmatter-only and structural-only chunks (headers/separators with no prose) are dropped at indexing and filtered at search time
  • Short chunk drop — final chunks below MIN_FINAL_TOKENS (default 150) are dropped with per-chunk stderr logging
  • Reconciliation sweep — after each index run, queries all Milvus paths and deletes orphan vectors whose source files no longer exist on disk
  • Search dedup — per-file result limiting prevents a single document from dominating results
  • Scoped search & pruningscope_path filters results to subdirectories; pruning never wipes unrelated data
  • Batch embedding & insert — concurrent batches with 429 retry, chunked Milvus inserts under gRPC 64MB limit
  • Shell reindex CLIreindex.py for large-scale indexing with real-time progress logs

📚 Documentation

Document Description
Embedding Providers All 6 providers: setup, auth, tuning, rate limiting
Milvus / Zilliz Setup Lite vs Standalone vs Zilliz Cloud, Docker Compose, troubleshooting
Indexing Architecture Non-blocking flow, to_thread, 3-way delta, reconciliation sweep
Optimization Chunk merging, header injection, batch insert, search dedup

Tools

Tool Description
index_documents Start background index job, returns job_id instantly
get_index_status Poll job status (running / succeeded / failed)
search_documents Semantic search with relevance scores and file paths
clear_index Reset vector database and tracking state

How It Works

flowchart LR
    A["📁 Markdown Files"] -->|"walk + filter"| B["🔍 Delta Scan<br/>mtime/size"]
    B -->|changed| C["✂️ Chunk + Merge"]
    B -->|unchanged| SKIP["⏭️ Skip"]
    B -->|deleted| PRUNE["🗑️ Prune"]
    C --> D["🧠 Embed"]
    D -->|"batch insert"| E["💾 Milvus"]

    F["🔎 Query"] --> D
    D -->|"k×5"| G["📊 Dedup + Top-K"]

    style A fill:#2d3748,color:#e2e8f0
    style D fill:#553c9a,color:#e9d8fd
    style E fill:#2a4365,color:#bee3f8
    style G fill:#22543d,color:#c6f6d5
    style PRUNE fill:#742a2a,color:#fed7d7

Configuration

Core

Variable Default Description
EMBEDDING_PROVIDER local gemini, openai, openai-compatible, vertex, voyage
EMBEDDING_DIM 768 Vector dimension
MILVUS_ADDRESS .db/milvus_markdown.db Milvus address or local file path
MARKDOWN_WORKSPACE Lock workspace root

Indexing

Variable Default Description
MARKDOWN_CHUNK_SIZE 2048 Token chunk size
MARKDOWN_CHUNK_OVERLAP 100 Token overlap between chunks
MIN_CHUNK_TOKENS 300 Small-chunk merge threshold
MIN_FINAL_TOKENS 150 Drop final chunks below this token count
DEDUP_MAX_PER_FILE 1 Max results per file (0 = off)
EMBEDDING_BATCH_SIZE 250 Texts per API call
EMBEDDING_CONCURRENT_BATCHES 4 Parallel batches
EMBEDDING_BATCH_DELAY_MS 0 Delay (ms) between batch waves
MILVUS_INSERT_BATCH 5000 Rows per Milvus insert (gRPC 64MB limit)

Tip: Defaults work well for most vaults. Adjust MIN_CHUNK_TOKENS / MIN_FINAL_TOKENS if short notes are being dropped unexpectedly. Changes require a force reindex (reindex.py --force).

See Embedding Providers for full auth and tuning options.

Performance

Metric Result
Unchanged files — hash computations 0 (mtime/size fast-path)
Changed file — embed + insert ~3 seconds
No changes — full scan instant
Full reindex (1300 files, 23K chunks) ~7–8 minutes

License

Apache 2.0 — see LICENSE for full text.

This project is a fork of MCP-Markdown-RAG by Zackriya Solutions. Original project is licensed under Apache 2.0; this fork maintains the same license.

Key additions over upstream:

  • Multi-provider embeddings (Gemini, Vertex AI, OpenAI, Voyage)
  • Milvus vector store replacing Qdrant
  • Non-blocking background indexing with asyncio.to_thread
  • 3-way delta scan (new/modified/deleted)
  • Smart chunk merging with parent header injection
  • Empty chunk filtering (frontmatter-only / structural-only drop)
  • Short chunk drop (final chunks below 150 tokens with per-chunk logging)
  • Reconciliation sweep (Milvus↔disk ghost vector cleanup)
  • Scoped search & pruning, batch embedding, shell CLI
  • VS Code Copilot MCP compatibility (dummy params for zero-required-arg tools)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdown_fastrag_mcp-1.7.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdown_fastrag_mcp-1.7.1-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file markdown_fastrag_mcp-1.7.1.tar.gz.

File metadata

  • Download URL: markdown_fastrag_mcp-1.7.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markdown_fastrag_mcp-1.7.1.tar.gz
Algorithm Hash digest
SHA256 555a9d9321e82b7296965b78141e59e7e2e8e44936cd764218db09fac51a031f
MD5 6fd4f159cb1318c2a7bccea4c4880817
BLAKE2b-256 0116a01c576ab50d4e8dc383fa98a1d6a75f78211e44a07530994e407ef3a2a7

See more details on using hashes here.

File details

Details for the file markdown_fastrag_mcp-1.7.1-py3-none-any.whl.

File metadata

File hashes

Hashes for markdown_fastrag_mcp-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 99e2ad77b965e41a10dca4f3a8d39fb790f644f1967a6e3a8c190f428d4a623d
MD5 4d32eb33ba40362cb071383f76e74632
BLAKE2b-256 34e692e6b8bfffaeab393b6b87f48797cc1ace9feafe78e23e878c7827cca39f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page