RAG-powered document indexing and search for MCP (Model Context Protocol)
Project description
๐ Ragdex
Transform Your Documents & Emails into an AI-Powered Knowledge Base
Ragdex is a powerful Model Context Protocol (MCP) server that transforms your personal document library and email archives into an AI-queryable knowledge base. Built for Claude Desktop and compatible with any MCP client.
Features โข Quick Start โข Documentation โข Examples โข Support
โจ Features
๐ฏ Core Capabilities
๐ Universal Document Support
|
๐ง Email Intelligence (v0.2.0+) ๐
|
๐ Advanced Search & RAG
|
๐จ Beautiful Web Dashboard
|
๐ ๏ธ MCP Tools Available
| Tool | Description |
|---|---|
| ๐ search | Semantic search with optional filters |
| ๐ compare_perspectives | Compare viewpoints across documents |
| ๐ library_stats | Get comprehensive statistics |
| ๐ summarize_book | Generate AI summaries |
| ๐ญ extract_quotes | Find relevant quotes on topics |
| โ question_answer | Direct Q&A from your library |
| ๐ list_books | Browse by pattern/author/directory |
| ๐ recent_books | Find recently indexed content |
| ๐ refresh_cache | Update search cache |
| ...and 8 more! |
๐ฏ Smart Email Filtering
๐ Privacy First: Email indexing is DISABLED by default. Your emails are NOT accessed unless you explicitly enable this feature. Learn more โ
When enabled, Ragdex intelligently filters out noise from your email archives:
- โ Auto-skips: Marketing, promotions, shopping receipts, newsletters
- โ Excludes: Spam, junk, trash folders
- โ Focuses on: Personal communications, important discussions
- โ๏ธ Configurable: Whitelist important senders, set date ranges
- ๐ Local processing: All email data stays on your computer
๐ Quick Start
๐ New to Ragdex? First time user? See QUICKSTART.md for detailed step-by-step instructions including prerequisites, troubleshooting, and first query examples.
Prerequisites
Quick Checklist (see Complete Prerequisites Guide โ):
- System: macOS 10.15+ or Linux (Ubuntu 20.04+, Debian 11+, Fedora 35+)
- Python: 3.10-3.13 (3.11+ recommended) - How to check โ
- macOS Tools: Xcode Command Line Tools - How to install โ
- macOS Tools: Homebrew - How to install โ
- Package Manager: uv (recommended) - How to install โ
- Claude Desktop: Free or paid tier - Download โ
- Permissions (macOS): Terminal Full Disk Access - Critical setup โ
- Resources: 8GB RAM min (16GB recommended), 5GB disk space
- Admin Access: Required for installation - Details โ
Optional Tools (format-specific):
- Calibre (MOBI/AZW ebooks), LibreOffice (.doc files), ocrmypdf + Tesseract (scanned PDFs), Ghostscript (corrupted PDFs)
- See all optional dependencies โ
Run Prerequisites Check Script: Verification script โ
Installation (2-5 minutes)
โก Use uv for best experience - 10-100x faster than pip, better dependency resolution, avoids common errors
# Using uv (โญ STRONGLY RECOMMENDED)
# Option 1: Use default Python
uv venv ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex
# Option 2: Specify Python version (3.9-3.13 supported)
uv venv --python 3.13 ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex
# Option 3: Use specific Python executable
uv venv --python python3.13 ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex
# Alternative: pip (slower, requires activation)
python3 -m venv ~/ragdex_env
source ~/ragdex_env/bin/activate
pip install ragdex
Supported Python versions: 3.9, 3.10, 3.11, 3.12, 3.13
Note: First run downloads ~2GB of AI models (5-10 min). Details
Don't have uv? Install it: curl -LsSf https://astral.sh/uv/install.sh | sh (then close/reopen Terminal)
๐ Optional: Legacy .doc File Support
Modern .docx files work out of the box. For legacy .doc files (pre-2007 Word format):
# Install optional dependencies
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support]'
# Also install LibreOffice (system dependency)
# macOS:
brew install --cask libreoffice
# Ubuntu/Debian:
sudo apt-get install libreoffice
# Fedora:
sudo dnf install libreoffice
Note: If you encounter .doc files without this optional setup, ragdex will provide a helpful error message with installation instructions.
Setup Services (2-3 minutes)
# Download and run interactive setup
curl -O https://raw.githubusercontent.com/hpoliset/ragdex/main/setup_services.sh
chmod +x setup_services.sh
./setup_services.sh
The setup will:
- Ask where your documents are located
- Optionally install Calibre for enhanced ebook support (MOBI/AZW)
- Set up background indexing services
- Configure the web dashboard (localhost:8888)
- Display Claude Desktop JSON configuration
Configure Claude Desktop
- Copy the JSON configuration displayed by the installer
- Open Claude's config:
~/Library/Application Support/Claude/claude_desktop_config.json - Paste the configuration (merge if you have other MCP servers)
- Restart Claude Desktop (Cmd+Q, then reopen)
โ Detailed configuration guide with examples
Verify Installation
# Check version
~/ragdex_env/bin/ragdex --version
# Check services running
launchctl list | grep ragdex
# View web dashboard
open http://localhost:8888
# Test in Claude Desktop
# Ask: "Can you check my library stats?"
โ Complete verification steps
Troubleshooting
Having issues? Common problems and solutions:
- Wrong Python version? Install Python 3.11 or 3.13
- Claude doesn't see Ragdex? Check your config
- No documents indexed? Verify paths and permissions
โ Full troubleshooting guide
You're done! ๐ Start querying your documents with Claude.
๐ Documentation
System Requirements
- Python 3.10-3.13 (3.11+ recommended for best performance)
- macOS (primary, fully tested) or Linux (untested โ community feedback welcome)
- 8GB RAM minimum (16GB recommended)
- Embedding model uses ~4GB
- Document processing can spike to 6-8GB for large PDFs
- Storage:
- ~500MB for Ragdex installation
- ~2GB for embedding models (auto-downloaded on first run)
- ~1MB per 100-page PDF for vector database storage
- Claude Desktop (required for MCP integration)
- Optional system dependencies (install via Homebrew or apt):
- Calibre โ
ebook-convertfor MOBI/AZW/AZW3 ebook processing - LibreOffice โ
sofficefor legacy.docfile conversion - ocrmypdf + Tesseract โ OCR for scanned PDFs (auto-detected when < 20% text)
- Ghostscript โ
gsfor cleaning corrupted/malformed PDFs that fail standard extraction
- Calibre โ
Configuration Options
Environment Variables
# Core paths
export PERSONAL_LIBRARY_DOC_PATH="/path/to/documents"
export PERSONAL_LIBRARY_DB_PATH="/path/to/database"
export PERSONAL_LIBRARY_LOGS_PATH="/path/to/logs"
# Email settings (v0.2.0+)
export PERSONAL_LIBRARY_INDEX_EMAILS=true
export PERSONAL_LIBRARY_EMAIL_SOURCES=apple_mail,outlook_local
export PERSONAL_LIBRARY_EMAIL_MAX_AGE_DAYS=365
export PERSONAL_LIBRARY_EMAIL_EXCLUDED_FOLDERS=Spam,Junk,Trash
# MCP Performance (v0.3.0+)
export MCP_WARMUP_ON_START=true # Pre-initialize on server start (recommended)
export MCP_INIT_TIMEOUT=30 # Seconds to wait for initialization
export MCP_TOOL_TIMEOUT=15 # Seconds to wait before timing out tool calls
Claude Desktop Configuration Example
๐ Complete Configuration Example
If this is your first MCP server, your claude_desktop_config.json should look like:
{
"mcpServers": {
"ragdex": {
"command": "/Users/yourname/ragdex_env/bin/ragdex-mcp",
"env": {
"PYTHONUNBUFFERED": "1",
"CHROMA_TELEMETRY": "false",
"PERSONAL_LIBRARY_DOC_PATH": "/Users/yourname/Documents",
"PERSONAL_LIBRARY_DB_PATH": "/Users/yourname/.ragdex/chroma_db",
"PERSONAL_LIBRARY_LOGS_PATH": "/Users/yourname/.ragdex/logs",
"MCP_WARMUP_ON_START": "true",
"MCP_INIT_TIMEOUT": "30",
"MCP_TOOL_TIMEOUT": "15"
}
}
}
}
If you already have other MCP servers, add ragdex to the existing structure:
{
"mcpServers": {
"existing-server": { ... },
"ragdex": {
"command": "/Users/yourname/ragdex_env/bin/ragdex-mcp",
"env": { ... }
}
}
}
Advanced Installation
๐ฆ Install with Optional Dependencies
# Legacy .doc file support (using uv - recommended)
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support]'
# Daemon mode for ragdex-index --daemon
uv pip install --python ~/ragdex_env/bin/python 'ragdex[services]'
# All extras
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support,services]'
๐ง Install from Source
git clone https://github.com/hpoliset/ragdex
cd ragdex
# Using uv (recommended)
uv pip install -e .
# With extras
uv pip install -e ".[doc-support,services]"
# Alternative: standard pip
# pip install -e ".[doc-support,services]"
๐ Available CLI Commands
# Main commands
ragdex-mcp # Start MCP server
ragdex-index # Start background indexer
ragdex-index --retry # Clear failed list and re-attempt failed documents
ragdex-web # Launch web dashboard
# Management commands
ragdex --help # Show all commands
ragdex ensure-dirs # Create directories
ragdex config # View configuration
ragdex index-status # Check indexing status
ragdex find-unindexed # Find new documents
ragdex manage-failed # Handle failed documents
๐ Upgrading Ragdex
Upgrading from PyPI
# Stop all services first (use bootout for macOS 11+)
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl unload ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
# Kill any running processes
pkill -f ragdex 2>/dev/null || true
# Using uv (recommended, faster)
uv pip install --upgrade ragdex
# Or with extras
uv pip install --upgrade 'ragdex[doc-support,services]'
# Alternative: standard pip
# pip install --upgrade ragdex
# Restart services (use bootstrap for macOS 11+)
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl load ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
# Restart Claude Desktop to reload MCP server
Upgrading from Source
# Stop services (use bootout for macOS 11+)
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl unload ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
# Kill any running processes
pkill -f ragdex 2>/dev/null || true
# Pull latest changes
cd ragdex
git pull origin main
# Upgrade dependencies (using uv for speed)
uv pip install --upgrade -e .
# Or with standard pip
# pip install --upgrade -e .
# Restart services
launchctl load ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
Service Management During Upgrades
โ๏ธ Complete Service Restart Process
1. Stop All Services
# Stop background indexer
launchctl unload ~/Library/LaunchAgents/com.ragdex.index-monitor.plist
# Stop web dashboard
launchctl unload ~/Library/LaunchAgents/com.ragdex.webmonitor.plist
# Or use the uninstall script (doesn't delete configs)
./scripts/uninstall_service.sh
./scripts/uninstall_webmonitor_service.sh
2. Perform Upgrade
# Upgrade via uv (recommended) or pip
uv pip install --upgrade ragdex
# Or: pip install --upgrade ragdex
3. Clear Cache & Locks (Optional)
# Clear any stale locks
rm -f ~/ragdex/chroma_db/*.lock
# Clear failed documents list if needed
ragdex clear-failed
# Refresh the search cache
ragdex refresh-cache
4. Restart Services
# Reinstall services (updates paths if needed)
./scripts/install_service.sh
./scripts/install_webmonitor_service.sh
# Or manually load
launchctl load ~/Library/LaunchAgents/com.ragdex.index-monitor.plist
launchctl load ~/Library/LaunchAgents/com.ragdex.webmonitor.plist
# Verify services are running
launchctl list | grep ragdex
5. Restart Claude Desktop
- Important: Claude Desktop must be fully quit and restarted to reload the MCP server
- On macOS: Cmd+Q to quit, then reopen Claude Desktop
- The MCP server will automatically reinitialize with the upgraded version
Post-Upgrade Verification
# Check version
ragdex --version
# Verify services are running
ragdex index-status
# Check web dashboard
open http://localhost:8888
# Test MCP connection in Claude
# Ask Claude: "Can you check my library stats?"
Troubleshooting Upgrades
๐ง Common Upgrade Issues
Services not starting after upgrade?
# Check service logs
tail -f ~/Library/Logs/ragdex_*.log
# Reinstall services with fresh configs
./setup_services.sh
"Unload failed: 5: Input/output error" on macOS?
# Use bootout/bootstrap for macOS 11+
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
# Alternative: Kill processes directly
pkill -f ragdex
Claude not recognizing new features?
- Fully quit Claude Desktop (Cmd+Q on macOS)
- Wait 5 seconds
- Reopen Claude Desktop
- The MCP server will reinitialize
Database compatibility issues?
# Backup existing database
cp -r ~/.ragdex/chroma_db ~/.ragdex/chroma_db.backup
# Clear and rebuild index (last resort)
rm -rf ~/.ragdex/chroma_db
ragdex-index --full-reindex
Permission errors after upgrade?
# Ensure directories have correct permissions
chmod -R 755 ~/.ragdex
ragdex ensure-dirs
๐ก Examples
Using with Claude Desktop
Once configured, you can ask Claude:
"Search my library for information about machine learning"
"Compare perspectives on climate change across my documents"
"Summarize the main themes in my recent emails"
"Find all documents mentioning Python programming"
"What meetings did I have last month?" (from emails)
Python API Usage (Advanced)
While Ragdex is primarily designed for Claude Desktop via MCP, you can also use it programmatically:
from personal_doc_library.core.shared_rag import RAGSystem
# Initialize the system
rag = RAGSystem()
# Search documents
results = rag.search_documents("artificial intelligence", max_results=5)
# Get document stats
stats = rag.get_library_statistics()
print(f"Documents indexed: {len(rag.book_index)}")
Note: The primary use case is through Claude Desktop. Direct API usage requires understanding the internal architecture.
๐ฏ Use Cases
๐ Personal Knowledge Management
- Build a searchable archive of your books, papers, and notes
- Never lose track of important information
- Connect ideas across different sources
๐ผ Professional Research
- Analyze technical documentation
- Compare different approaches from papers
- Extract key insights from reports
๐ง Email Intelligence (Optional)
- Search through years of communications
- Find important attachments
- Track project discussions
- Note: Must be enabled manually - disabled by default for privacy
๐ Academic Study
- Research across textbooks and papers
- Extract quotes for citations
- Compare author perspectives
๐๏ธ Architecture
graph TD
A[๐ Document Sources<br/>PDF, Word, EPUB, MOBI] --> B[โ๏ธ Ragdex Indexer<br/>Background Service]
B --> C[๐๏ธ ChromaDB<br/>Vector Store<br/>768-dim embeddings]
C --> D[๐ MCP Server<br/>17 Tools & Resources]
D --> E[๐ค Claude Desktop<br/>AI Assistant]
F[๐ง Email Archives<br/>Apple Mail, Outlook] --> B
G[๐ Web Dashboard<br/>localhost:8888] --> C
B -.->|MD5 Hash<br/>Deduplication| H[๐ Change Detection]
B -.->|OCR Support| I[๐ Scanned Docs]
style A fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000
style E fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style F fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
style G fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#000
style B fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
style C fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000
style D fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px,color:#000
style H fill:#fffde7,stroke:#f57f17,stroke-width:1px,color:#000
style I fill:#fffde7,stroke:#f57f17,stroke-width:1px,color:#000
Components
- โ๏ธ Indexer: Background service monitoring document changes with automatic retry
- ๐๏ธ Vector Store: ChromaDB with 768-dim embeddings (all-mpnet-base-v2)
- ๐ MCP Server: 17 tools, 5 prompts, 4 resources for document interaction
- ๐ Web Monitor: Real-time dashboard at localhost:8888 with search interface
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone and install in dev mode
git clone https://github.com/hpoliset/ragdex
cd ragdex
# Using uv (recommended)
uv pip install -e ".[dev]"
# Alternative: standard pip
# pip install -e ".[dev]"
# Run tests
pytest tests/
# Format code
black src/
๐ Stats & Performance
- Indexing Speed:
- ~10-20 documents/minute (varies by size and format)
- Large PDFs (>100MB): 2-5 minutes each
- OCR processing: 1-2 pages/minute
- Search Latency:
- First search: 2-5 seconds (model loading)
- Subsequent searches: 100-500ms
- Memory Usage:
- Idle: ~500MB
- Active indexing: 4-8GB
- With embeddings loaded: 4-6GB constant
- Storage:
- Vector DB: ~10MB per 1000 pages
- Metadata index: ~1MB per 100 documents
๐ Troubleshooting
๐ Common Issues
Services not starting?
# Check service status
launchctl list | grep ragdex
# View logs
tail -f ~/ragdex/logs/ragdex_*.log
Documents not indexing?
# Check for failed documents
ragdex manage-failed
# Verify paths
ragdex config
Permission errors?
# Ensure directories exist
ragdex ensure-dirs
# Check permissions
ls -la ~/Documents/Library
๐ License
MIT License - see LICENSE for details.
๐ Acknowledgments
Built with:
- LangChain - LLM framework
- ChromaDB - Vector database
- Sentence Transformers - Embeddings
- Model Context Protocol - MCP specification
๐ Support
- ๐ง Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ Wiki: Documentation Wiki
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragdex-0.4.0.tar.gz.
File metadata
- Download URL: ragdex-0.4.0.tar.gz
- Upload date:
- Size: 688.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de27e2d99c862e3ac2902fc38477b8586dc1aec95018db1ddb46acd0d5667290
|
|
| MD5 |
2fda2f40b4f274c29c01c87217dff946
|
|
| BLAKE2b-256 |
f58f7c576b3207fa4ec743d3708601d38cfc8936fdb1c4fc8e2ff610012dae3c
|
File details
Details for the file ragdex-0.4.0-py3-none-any.whl.
File metadata
- Download URL: ragdex-0.4.0-py3-none-any.whl
- Upload date:
- Size: 120.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41dd2120eb00e91c0682f8e7f393337c68664033284ba32788ebac0916da1c5d
|
|
| MD5 |
48bc39621a6b6db47d367aab850d6877
|
|
| BLAKE2b-256 |
e50bd4d2db8d65be3691862ef57cd538582ac08d071d6b91ce70df06fb7d9491
|