Skip to main content

RAG MCP Server — watches folders, chunks documents, serves semantic search via MCP

Project description

document-rag-mcp

Tests Docs PyPI - Version PyPI - Python Version License

A high-performance Model Context Protocol (MCP) server for local document search and extraction. It recursively scans and watches configured directories for .txt, .md, and .pdf files, indexes their content, and exposes them as tools for LLMs.

📖 Full Documentation: https://janlo.github.io/document-rag-mcp/

Key Features

  • Hybrid Search (Semantic + BM25): Blends dense semantic vector search (ChromaDB) with sparse keyword search (SQLite FTS5) using Reciprocal Rank Fusion (RRF) for optimal retrieval.
  • Section-Grain Chunking: Text from all pages is unified and chunked as a single stream, then mapped back to its primary page and section via character offsets, preventing artificial boundaries at page borders.
  • TOC-Aware Extractor: Extracts PDF headings using the document's own Table of Contents (TOC), falling back to typography-aware layout detection if TOC is missing.
  • Incremental Indexing: Uses content hashing (SHA-256) at both the file and chunk levels. Files that have not changed are skipped completely, and modified files only re-embed chunks that actually changed.
  • Auto-Pruning: Automatically detects when files are deleted from the disk and prunes them from the index.
  • Multimodal OCR: Detects scanned or text-less PDF pages and routes them through an optional vision-capable LLM.
  • MCP Native: Exposes tools for hybrid search, collection statistics, metadata analysis, and full document text/binary content retrieval.

Quick Start

1. Installation

Ensure you have uv installed, then synchronize the environment:

git clone https://github.com/janlo/document-rag-mcp.git
cd document-rag-mcp
uv sync --group dev

2. Configuration

Copy the example configuration:

cp config.example.yaml config.yaml

And edit config.yaml to specify the folders you want to watch.

3. CLI Commands

  • Ingest: uv run document-rag-mcp ingest
  • Search: uv run document-rag-mcp search "your query"
  • Start MCP Server: uv run document-rag-mcp serve

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_rag_mcp-0.2.1.tar.gz (277.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_rag_mcp-0.2.1-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file document_rag_mcp-0.2.1.tar.gz.

File metadata

  • Download URL: document_rag_mcp-0.2.1.tar.gz
  • Upload date:
  • Size: 277.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for document_rag_mcp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 20da60f1ac71aceaa2f456ec9a2be112c2d5e305cd04498ec054a09504f19ad2
MD5 afae5cb8c6d419305844d70714233491
BLAKE2b-256 e65dfe440dee3bfaf216e5f0c0b6bca70b8835df8359a734d318984149bef1ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for document_rag_mcp-0.2.1.tar.gz:

Publisher: release.yml on janLo/document-rag-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file document_rag_mcp-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for document_rag_mcp-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 588a8c6348e3da69b41edc013c555f506f05eae1bc67ef2804cb025fccc4532a
MD5 c808fd31775a9cd2969d757c7b4cb50c
BLAKE2b-256 80000857473cc0171ea4d04f3e1e1f4f1e4f925ac9bd9055d544d11ca3c944bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for document_rag_mcp-0.2.1-py3-none-any.whl:

Publisher: release.yml on janLo/document-rag-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page