MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.
Project description
๐ MCP-RAG
MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB) using intelligent chunking strategies, multi-format document support, and enterprise-grade reliability.
๐ Features
๐ Multi-Format Document Support
- PDF: Intelligent page-by-page processing with table detection
- DOCX: Paragraph and table extraction with formatting preservation
- Excel: Sheet-aware processing with column context (.xlsx/.xls)
- CSV: Smart row batching with header preservation
- PPTX: Support for PPTX
- IMAGE: Suppport for jpeg , png , webp , gif etc and OCR
๐ Large File Processing
- Adaptive chunking: Different strategies based on file size
- Memory management: Streaming processing for 50MB+ files
- Progress tracking: Real-time progress indicators
- Timeout handling: Graceful handling of long-running operations
๐ง Advanced RAG Capabilities
- Semantic search: Vector similarity with confidence scores
- Cross-document queries: Search across multiple documents simultaneously
- Source attribution: Citations with similarity scores
- Hybrid retrieval: Combine semantic and keyword search
๐ Model Context Protocol (MCP) Integration
- Universal tool interface: Standardized AI-to-tool communication
- Auto-discovery: LangChain agents automatically find and use tools
- Secure communication: Built-in permission controls
- Extensible architecture: Easy to add new document processors
๐ข Enterprise Ready
- Custom LLM endpoints: Support for any OpenAI-compatible API
- Vector database options: ChromaDB (local) + Milvus (production)
- Batch processing: Handles API rate limits and batch size constraints
- Error recovery: Retry logic and graceful degradation
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Streamlit โ โ LangChain โ โ MCP Server โ โ Frontend โโโโโบโ Agent โโโโโบโ (Tools) โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โ โโโโโโโโโผโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโผโโโโโโโ โ Document โ โ Vector Database โ โ LLM API โ โ Processors โ โ (ChromaDB) โ โ Endpoint โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
๐ Quick Start
Prerequisites
- Python 3.11+
- OpenAI API key or compatible LLM endpoint
- 8GB+ RAM (for large file processing)
Installation
Clone the repository
git clone https://github.com/yourusername/rag-large-file-processor.git
cd rag-large-file-processor
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Create .env file
cat > .env << EOF
OPENAI_API_KEY=your_openai_api_key_here
BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o
VECTOR_DB_TYPE=chromadb
streamlit run streamlit_app.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_anuragb7_mcp_rag-0.1.0.tar.gz.
File metadata
- Download URL: iflow_mcp_anuragb7_mcp_rag-0.1.0.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdaada7c04993216397bd65750d17f777a6850ca8a7d0498d1b15741a26b38ec
|
|
| MD5 |
a28d7c125faf881f635406d11aecc607
|
|
| BLAKE2b-256 |
aaa8a11f299b6335b39e4861445b1f108cda2463a372ac1aad750c9cbc79015c
|
File details
Details for the file iflow_mcp_anuragb7_mcp_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_anuragb7_mcp_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb8c468d84c8b657fa72bdb9da076baf045fe1c7775b053dada9afb8468da603
|
|
| MD5 |
51a49a4bd42645995df8a7e1be317b3f
|
|
| BLAKE2b-256 |
f54e8e32fa115457417962000808947df397d35435529ffcf40f2c165b6f93f7
|