This is a local rag-mcp solution with chromadb using langchain and docling
Project description
RAG MCP: Document Processing Server
A Retrieval-Augmented Generation (RAG) server built on the Model Context Protocol (MCP) for intelligent document processing and question answering.
Overview
RAG MCP is a tool that allows you to index various document formats and perform semantic searches against them. It uses advanced embedding techniques and vector databases to make your documents searchable through natural language queries.
Features
- Document Indexing: Support for various document formats (PDF, DOCX, XLSX, PPTX, Markdown, AsciiDoc, HTML, XHTML, CSV)
- Semantic Search: Query your documents using natural language
- Flexible Embedding Models: Choose between HuggingFace BGE (default) or Ollama embeddings.
- High Performance: Optimized for various hardware configurations with automatic device selection (CUDA, MPS, CPU) for HuggingFace embeddings.
- Persistent Storage: Vector embeddings are stored locally for future use
Requirements
- Python 3.11+
- Environment with access to your documents
- (Optional) Ollama installed and running if using Ollama embeddings.
Installation
1. Install UV
First, you need to install UV, a Python package installer and resolver:
On macOS/Linux:
curl -sSf https://astral.sh/uv/install.sh | sh
On Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
2. Run RAG MCP
Once UV is installed, you can run RAG MCP directly using:
uvx rag-mcp
This will start the MCP server and make it available for document processing.
Environment Variables:
You can configure RAG MCP using environment variables:
PERSIST_DIRECTORY(Required): Path to the directory where the vector database will be stored (e.g.,/path/to/your/persist/directory). Achromadbsubfolder will be created here.USE_OLLAMA_EMBEDDING(Optional): Set toTrueto use Ollama embeddings instead of the default HuggingFace BGE embeddings. Requires Ollama to be running. Defaults toFalse.OLLAMA_EMBEDDING_MODEL(Optional): Specifies the Ollama model name to use for embeddings (e.g.,nomic-embed-text). Only used ifUSE_OLLAMA_EMBEDDING=True. Defaults tobge-m3:latest.
Example:
PERSIST_DIRECTORY=/data/rag_db USE_OLLAMA_EMBEDDING=True OLLAMA_EMBEDDING_MODEL=nomic-embed-text uvx rag-mcp
IDE Integration
VS Code Integration
To integrate with Visual Studio Code, create a mcp.json file with the following content:
{
"servers": {
"rag-mcp-server": {
"type": "stdio",
"command": "uvx",
"args": [
"rag-mcp"
],
"env": {
"PERSIST_DIRECTORY": "/path/to/your/persist/directory",
// Optional: Uncomment and set to true to use Ollama
// "USE_OLLAMA_EMBEDDING": "True",
// Optional: Specify Ollama model if using Ollama
// "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
}
}
}
}
Replace /path/to/your/persist/directory with the desired path. Adjust Ollama settings as needed.
Cursor Integration
To integrate with Cursor, go to Cursor Settings > MCP and paste this configuration:
{
"mcpServers": {
"rag-mcp": {
"command": "uvx",
"args": ["rag-mcp"],
"env": {
"PERSIST_DIRECTORY": "/path/to/your/persist/directory",
// Optional: Uncomment and set to true to use Ollama
// "USE_OLLAMA_EMBEDDING": "True",
// Optional: Specify Ollama model if using Ollama
// "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
}
}
}
}
Replace /path/to/your/persist/directory with the desired path. Adjust Ollama settings as needed.
Usage
Supported Embedding Models
The system supports the following embedding models:
- HuggingFace BGE Embeddings (default): High-quality embeddings that work offline. Optimized for available hardware (CUDA, MPS, CPU).
- Uses
BAAI/bge-m3model.
- Uses
- Ollama Embeddings (optional): Uses embeddings generated by a running Ollama instance. Enable by setting
USE_OLLAMA_EMBEDDING=True.- Default model:
bge-m3:latest. - Specify a different model using
OLLAMA_EMBEDDING_MODEL.
- Default model:
How It Works
- Document Loading: Uses DoclingLoader to parse various document formats
- Text Splitting: Documents are split into manageable chunks using RecursiveCharacterTextSplitter
- Embedding Generation: Text chunks are converted to vector embeddings using either HuggingFace BGE or Ollama.
- Storage: Embeddings are stored in a Chroma vector database
- Retrieval: When queried, the system finds semantically similar content to answer questions
Acknowledgments
- This project uses LangChain and Docling for intelligent document parsing
- Vector storage provided by Chroma
- Embedding models from HuggingFace and potentially Ollama
- Built on the Model Context Protocol (MCP)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_mcp-0.1.4.tar.gz.
File metadata
- Download URL: rag_mcp-0.1.4.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cd96e7023b9333ff1fbf2136e3b222bdfe45eb42e1c1d46714526c3830811f8
|
|
| MD5 |
4a1ee4e641a7fc765deb04cee523c207
|
|
| BLAKE2b-256 |
dbb1b5090eb72343e51ed8dd656af880be7ddd26c16dfd5000df5f3d912b0076
|
File details
Details for the file rag_mcp-0.1.4-py3-none-any.whl.
File metadata
- Download URL: rag_mcp-0.1.4-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95035a456fbfb4e36d38fb103a4ca3ebd6fd4c5c8de9c3d4e05cb04a06146149
|
|
| MD5 |
fc218691ebf5dbba2b6726aa6e3f219f
|
|
| BLAKE2b-256 |
58e12548269854714e2e8cef6e689693300e5a0942388c2ccc51b5e263e4ba0e
|