Skip to main content

RAG-powered document indexing and search for MCP (Model Context Protocol)

Project description

๐Ÿš€ Ragdex

Transform Your Documents & Emails into an AI-Powered Knowledge Base

PyPI version Python 3.10+ License: MIT MCP Compatible Downloads

Ragdex is a powerful Model Context Protocol (MCP) server that transforms your personal document library and email archives into an AI-queryable knowledge base. Built for Claude Desktop and compatible with any MCP client.

Features โ€ข Quick Start โ€ข Documentation โ€ข Examples โ€ข Support


โœจ Features

๐ŸŽฏ Core Capabilities

๐Ÿ“š Universal Document Support

  • PDFs with OCR for scanned documents
  • Office Files (Word, PowerPoint, Excel)
  • E-books (EPUB, MOBI, AZW, AZW3)
  • Plain Text and Markdown files
  • Automatic format detection

๐Ÿ“ง Email Intelligence (v0.2.0+) ๐Ÿ”’

  • Apple Mail (EMLX) support
  • Outlook (OLM export) support
  • Smart filtering - Skip marketing & spam
  • Attachment processing
  • Thread reconstruction
  • Privacy: Disabled by default - Enable it โ†’

๐Ÿ” Advanced Search & RAG

  • Semantic search with vector embeddings
  • Cross-document insights
  • Context-aware responses
  • 17+ specialized MCP tools
  • Real-time index updates

๐ŸŽจ Beautiful Web Dashboard

  • Real-time monitoring at localhost:8888
  • Indexing progress tracking
  • Document & email statistics
  • Failed document management
  • Search interface with filters

๐Ÿ› ๏ธ MCP Tools Available

Tool Description
๐Ÿ” search Semantic search with optional filters
๐Ÿ“Š compare_perspectives Compare viewpoints across documents
๐Ÿ“ˆ library_stats Get comprehensive statistics
๐Ÿ“– summarize_book Generate AI summaries
๐Ÿ’ญ extract_quotes Find relevant quotes on topics
โ“ question_answer Direct Q&A from your library
๐Ÿ“š list_books Browse by pattern/author/directory
๐Ÿ“… recent_books Find recently indexed content
๐Ÿ”„ refresh_cache Update search cache
...and 8 more!

๐ŸŽฏ Smart Email Filtering

๐Ÿ”’ Privacy First: Email indexing is DISABLED by default. Your emails are NOT accessed unless you explicitly enable this feature. Learn more โ†’

When enabled, Ragdex intelligently filters out noise from your email archives:

  • โŒ Auto-skips: Marketing, promotions, shopping receipts, newsletters
  • โŒ Excludes: Spam, junk, trash folders
  • โœ… Focuses on: Personal communications, important discussions
  • โš™๏ธ Configurable: Whitelist important senders, set date ranges
  • ๐Ÿ” Local processing: All email data stays on your computer

๐Ÿš€ Quick Start

๐Ÿ†• New to Ragdex? First time user? See QUICKSTART.md for detailed step-by-step instructions including prerequisites, troubleshooting, and first query examples.

Prerequisites

Quick Checklist (see Complete Prerequisites Guide โ†’):

Optional Tools (format-specific):

Run Prerequisites Check Script: Verification script โ†’

Installation (2-5 minutes)

โšก Use uv for best experience - 10-100x faster than pip, better dependency resolution, avoids common errors

# Using uv (โญ STRONGLY RECOMMENDED)
# Option 1: Use default Python
uv venv ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex

# Option 2: Specify Python version (3.9-3.13 supported)
uv venv --python 3.13 ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex

# Option 3: Use specific Python executable
uv venv --python python3.13 ~/ragdex_env
uv pip install --python ~/ragdex_env/bin/python ragdex

# Alternative: pip (slower, requires activation)
python3 -m venv ~/ragdex_env
source ~/ragdex_env/bin/activate
pip install ragdex

Supported Python versions: 3.9, 3.10, 3.11, 3.12, 3.13

Note: First run downloads ~2GB of AI models (5-10 min). Details

Don't have uv? Install it: curl -LsSf https://astral.sh/uv/install.sh | sh (then close/reopen Terminal)

๐Ÿ“„ Optional: Legacy .doc File Support

Modern .docx files work out of the box. For legacy .doc files (pre-2007 Word format):

# Install optional dependencies
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support]'

# Also install LibreOffice (system dependency)
# macOS:
brew install --cask libreoffice

# Ubuntu/Debian:
sudo apt-get install libreoffice

# Fedora:
sudo dnf install libreoffice

Note: If you encounter .doc files without this optional setup, ragdex will provide a helpful error message with installation instructions.

Setup Services (2-3 minutes)

# Download and run interactive setup
curl -O https://raw.githubusercontent.com/hpoliset/ragdex/main/setup_services.sh
chmod +x setup_services.sh
./setup_services.sh

The setup will:

  • Ask where your documents are located
  • Optionally install Calibre for enhanced ebook support (MOBI/AZW)
  • Set up background indexing services
  • Configure the web dashboard (localhost:8888)
  • Display Claude Desktop JSON configuration

Configure Claude Desktop

  1. Copy the JSON configuration displayed by the installer
  2. Open Claude's config: ~/Library/Application Support/Claude/claude_desktop_config.json
  3. Paste the configuration (merge if you have other MCP servers)
  4. Restart Claude Desktop (Cmd+Q, then reopen)

โ†’ Detailed configuration guide with examples

Verify Installation

# Check version
~/ragdex_env/bin/ragdex --version

# Check services running
launchctl list | grep ragdex

# View web dashboard
open http://localhost:8888

# Test in Claude Desktop
# Ask: "Can you check my library stats?"

โ†’ Complete verification steps

Troubleshooting

Having issues? Common problems and solutions:

โ†’ Full troubleshooting guide

You're done! ๐ŸŽ‰ Start querying your documents with Claude.


๐Ÿ“– Documentation

System Requirements

  • Python 3.10-3.13 (3.11+ recommended for best performance)
  • macOS (primary, fully tested) or Linux (untested โ€” community feedback welcome)
  • 8GB RAM minimum (16GB recommended)
    • Embedding model uses ~4GB
    • Document processing can spike to 6-8GB for large PDFs
  • Storage:
    • ~500MB for Ragdex installation
    • ~2GB for embedding models (auto-downloaded on first run)
    • ~1MB per 100-page PDF for vector database storage
  • Claude Desktop (required for MCP integration)
  • Optional system dependencies (install via Homebrew or apt):
    • Calibre โ€” ebook-convert for MOBI/AZW/AZW3 ebook processing
    • LibreOffice โ€” soffice for legacy .doc file conversion
    • ocrmypdf + Tesseract โ€” OCR for scanned PDFs (auto-detected when < 20% text)
    • Ghostscript โ€” gs for cleaning corrupted/malformed PDFs that fail standard extraction

Configuration Options

Environment Variables

# Core paths
export PERSONAL_LIBRARY_DOC_PATH="/path/to/documents"
export PERSONAL_LIBRARY_DB_PATH="/path/to/database"
export PERSONAL_LIBRARY_LOGS_PATH="/path/to/logs"

# Email settings (v0.2.0+)
export PERSONAL_LIBRARY_INDEX_EMAILS=true
export PERSONAL_LIBRARY_EMAIL_SOURCES=apple_mail,outlook_local
export PERSONAL_LIBRARY_EMAIL_MAX_AGE_DAYS=365
export PERSONAL_LIBRARY_EMAIL_EXCLUDED_FOLDERS=Spam,Junk,Trash

# MCP Performance (v0.3.0+)
export MCP_WARMUP_ON_START=true       # Pre-initialize on server start (recommended)
export MCP_INIT_TIMEOUT=30            # Seconds to wait for initialization
export MCP_TOOL_TIMEOUT=15            # Seconds to wait before timing out tool calls

Claude Desktop Configuration Example

๐Ÿ“ Complete Configuration Example

If this is your first MCP server, your claude_desktop_config.json should look like:

{
  "mcpServers": {
    "ragdex": {
      "command": "/Users/yourname/ragdex_env/bin/ragdex-mcp",
      "env": {
        "PYTHONUNBUFFERED": "1",
        "CHROMA_TELEMETRY": "false",
        "PERSONAL_LIBRARY_DOC_PATH": "/Users/yourname/Documents",
        "PERSONAL_LIBRARY_DB_PATH": "/Users/yourname/.ragdex/chroma_db",
        "PERSONAL_LIBRARY_LOGS_PATH": "/Users/yourname/.ragdex/logs",
        "MCP_WARMUP_ON_START": "true",
        "MCP_INIT_TIMEOUT": "30",
        "MCP_TOOL_TIMEOUT": "15"
      }
    }
  }
}

If you already have other MCP servers, add ragdex to the existing structure:

{
  "mcpServers": {
    "existing-server": { ... },
    "ragdex": {
      "command": "/Users/yourname/ragdex_env/bin/ragdex-mcp",
      "env": { ... }
    }
  }
}

Advanced Installation

๐Ÿ“ฆ Install with Optional Dependencies
# Legacy .doc file support (using uv - recommended)
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support]'

# Daemon mode for ragdex-index --daemon
uv pip install --python ~/ragdex_env/bin/python 'ragdex[services]'

# All extras
uv pip install --python ~/ragdex_env/bin/python 'ragdex[doc-support,services]'
๐Ÿ”ง Install from Source
git clone https://github.com/hpoliset/ragdex
cd ragdex

# Using uv (recommended)
uv pip install -e .

# With extras
uv pip install -e ".[doc-support,services]"

# Alternative: standard pip
# pip install -e ".[doc-support,services]"
๐Ÿ“‹ Available CLI Commands
# Main commands
ragdex-mcp            # Start MCP server
ragdex-index          # Start background indexer
ragdex-index --retry  # Clear failed list and re-attempt failed documents
ragdex-web            # Launch web dashboard

# Management commands
ragdex --help                        # Show all commands
ragdex ensure-dirs                   # Create directories
ragdex config                        # View configuration
ragdex index-status                  # Check indexing status
ragdex find-unindexed                # Find new documents
ragdex manage-failed                 # Handle failed documents

๐Ÿ”„ Upgrading Ragdex

Upgrading from PyPI

# Stop all services first (use bootout for macOS 11+)
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl unload ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null

# Kill any running processes
pkill -f ragdex 2>/dev/null || true

# Using uv (recommended, faster)
uv pip install --upgrade ragdex

# Or with extras
uv pip install --upgrade 'ragdex[doc-support,services]'

# Alternative: standard pip
# pip install --upgrade ragdex

# Restart services (use bootstrap for macOS 11+)
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl load ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null

# Restart Claude Desktop to reload MCP server

Upgrading from Source

# Stop services (use bootout for macOS 11+)
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null || \
launchctl unload ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null

# Kill any running processes
pkill -f ragdex 2>/dev/null || true

# Pull latest changes
cd ragdex
git pull origin main

# Upgrade dependencies (using uv for speed)
uv pip install --upgrade -e .

# Or with standard pip
# pip install --upgrade -e .

# Restart services
launchctl load ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null

Service Management During Upgrades

โš™๏ธ Complete Service Restart Process

1. Stop All Services

# Stop background indexer
launchctl unload ~/Library/LaunchAgents/com.ragdex.index-monitor.plist

# Stop web dashboard
launchctl unload ~/Library/LaunchAgents/com.ragdex.webmonitor.plist

# Or use the uninstall script (doesn't delete configs)
./scripts/uninstall_service.sh
./scripts/uninstall_webmonitor_service.sh

2. Perform Upgrade

# Upgrade via uv (recommended) or pip
uv pip install --upgrade ragdex
# Or: pip install --upgrade ragdex

3. Clear Cache & Locks (Optional)

# Clear any stale locks
rm -f ~/ragdex/chroma_db/*.lock

# Clear failed documents list if needed
ragdex clear-failed

# Refresh the search cache
ragdex refresh-cache

4. Restart Services

# Reinstall services (updates paths if needed)
./scripts/install_service.sh
./scripts/install_webmonitor_service.sh

# Or manually load
launchctl load ~/Library/LaunchAgents/com.ragdex.index-monitor.plist
launchctl load ~/Library/LaunchAgents/com.ragdex.webmonitor.plist

# Verify services are running
launchctl list | grep ragdex

5. Restart Claude Desktop

  • Important: Claude Desktop must be fully quit and restarted to reload the MCP server
  • On macOS: Cmd+Q to quit, then reopen Claude Desktop
  • The MCP server will automatically reinitialize with the upgraded version

Post-Upgrade Verification

# Check version
ragdex --version

# Verify services are running
ragdex index-status

# Check web dashboard
open http://localhost:8888

# Test MCP connection in Claude
# Ask Claude: "Can you check my library stats?"

Troubleshooting Upgrades

๐Ÿ”ง Common Upgrade Issues

Services not starting after upgrade?

# Check service logs
tail -f ~/Library/Logs/ragdex_*.log

# Reinstall services with fresh configs
./setup_services.sh

"Unload failed: 5: Input/output error" on macOS?

# Use bootout/bootstrap for macOS 11+
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.ragdex.* 2>/dev/null

# Alternative: Kill processes directly
pkill -f ragdex

Claude not recognizing new features?

  • Fully quit Claude Desktop (Cmd+Q on macOS)
  • Wait 5 seconds
  • Reopen Claude Desktop
  • The MCP server will reinitialize

Database compatibility issues?

# Backup existing database
cp -r ~/.ragdex/chroma_db ~/.ragdex/chroma_db.backup

# Clear and rebuild index (last resort)
rm -rf ~/.ragdex/chroma_db
ragdex-index --full-reindex

Permission errors after upgrade?

# Ensure directories have correct permissions
chmod -R 755 ~/.ragdex
ragdex ensure-dirs

๐Ÿ’ก Examples

Using with Claude Desktop

Once configured, you can ask Claude:

"Search my library for information about machine learning"
"Compare perspectives on climate change across my documents"
"Summarize the main themes in my recent emails"
"Find all documents mentioning Python programming"
"What meetings did I have last month?" (from emails)

Python API Usage (Advanced)

While Ragdex is primarily designed for Claude Desktop via MCP, you can also use it programmatically:

from personal_doc_library.core.shared_rag import RAGSystem

# Initialize the system
rag = RAGSystem()

# Search documents
results = rag.search_documents("artificial intelligence", max_results=5)

# Get document stats
stats = rag.get_library_statistics()
print(f"Documents indexed: {len(rag.book_index)}")

Note: The primary use case is through Claude Desktop. Direct API usage requires understanding the internal architecture.


๐ŸŽฏ Use Cases

๐Ÿ“š Personal Knowledge Management

  • Build a searchable archive of your books, papers, and notes
  • Never lose track of important information
  • Connect ideas across different sources

๐Ÿ’ผ Professional Research

  • Analyze technical documentation
  • Compare different approaches from papers
  • Extract key insights from reports

๐Ÿ“ง Email Intelligence (Optional)

  • Search through years of communications
  • Find important attachments
  • Track project discussions
  • Note: Must be enabled manually - disabled by default for privacy

๐ŸŽ“ Academic Study

  • Research across textbooks and papers
  • Extract quotes for citations
  • Compare author perspectives

๐Ÿ—๏ธ Architecture

graph TD
    A[๐Ÿ“š Document Sources<br/>PDF, Word, EPUB, MOBI] --> B[โš™๏ธ Ragdex Indexer<br/>Background Service]
    B --> C[๐Ÿ—„๏ธ ChromaDB<br/>Vector Store<br/>768-dim embeddings]
    C --> D[๐Ÿ”Œ MCP Server<br/>17 Tools & Resources]
    D --> E[๐Ÿค– Claude Desktop<br/>AI Assistant]

    F[๐Ÿ“ง Email Archives<br/>Apple Mail, Outlook] --> B
    G[๐Ÿ“Š Web Dashboard<br/>localhost:8888] --> C

    B -.->|MD5 Hash<br/>Deduplication| H[๐Ÿ” Change Detection]
    B -.->|OCR Support| I[๐Ÿ“„ Scanned Docs]

    style A fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000
    style E fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
    style F fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
    style G fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#000
    style B fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
    style C fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000
    style D fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px,color:#000
    style H fill:#fffde7,stroke:#f57f17,stroke-width:1px,color:#000
    style I fill:#fffde7,stroke:#f57f17,stroke-width:1px,color:#000

๐Ÿ“– View Detailed Architecture Documentation โ†’

Components

  • โš™๏ธ Indexer: Background service monitoring document changes with automatic retry
  • ๐Ÿ—„๏ธ Vector Store: ChromaDB with 768-dim embeddings (all-mpnet-base-v2)
  • ๐Ÿ”Œ MCP Server: 17 tools, 5 prompts, 4 resources for document interaction
  • ๐Ÿ“Š Web Monitor: Real-time dashboard at localhost:8888 with search interface

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone and install in dev mode
git clone https://github.com/hpoliset/ragdex
cd ragdex

# Using uv (recommended)
uv pip install -e ".[dev]"

# Alternative: standard pip
# pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black src/

๐Ÿ“Š Stats & Performance

  • Indexing Speed:
    • ~10-20 documents/minute (varies by size and format)
    • Large PDFs (>100MB): 2-5 minutes each
    • OCR processing: 1-2 pages/minute
  • Search Latency:
    • First search: 2-5 seconds (model loading)
    • Subsequent searches: 100-500ms
  • Memory Usage:
    • Idle: ~500MB
    • Active indexing: 4-8GB
    • With embeddings loaded: 4-6GB constant
  • Storage:
    • Vector DB: ~10MB per 1000 pages
    • Metadata index: ~1MB per 100 documents

๐Ÿ› Troubleshooting

๐Ÿ“ Common Issues

Services not starting?

# Check service status
launchctl list | grep ragdex

# View logs
tail -f ~/ragdex/logs/ragdex_*.log

Documents not indexing?

# Check for failed documents
ragdex manage-failed

# Verify paths
ragdex config

Permission errors?

# Ensure directories exist
ragdex ensure-dirs

# Check permissions
ls -la ~/Documents/Library

๐Ÿ“œ License

MIT License - see LICENSE for details.


๐Ÿ™ Acknowledgments

Built with:


๐Ÿ“ž Support


Made with โค๏ธ for the AI community

โญ Star us on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragdex-0.4.0.tar.gz (688.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragdex-0.4.0-py3-none-any.whl (120.4 kB view details)

Uploaded Python 3

File details

Details for the file ragdex-0.4.0.tar.gz.

File metadata

  • Download URL: ragdex-0.4.0.tar.gz
  • Upload date:
  • Size: 688.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ragdex-0.4.0.tar.gz
Algorithm Hash digest
SHA256 de27e2d99c862e3ac2902fc38477b8586dc1aec95018db1ddb46acd0d5667290
MD5 2fda2f40b4f274c29c01c87217dff946
BLAKE2b-256 f58f7c576b3207fa4ec743d3708601d38cfc8936fdb1c4fc8e2ff610012dae3c

See more details on using hashes here.

File details

Details for the file ragdex-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: ragdex-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 120.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ragdex-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41dd2120eb00e91c0682f8e7f393337c68664033284ba32788ebac0916da1c5d
MD5 48bc39621a6b6db47d367aab850d6877
BLAKE2b-256 e50bd4d2db8d65be3691862ef57cd538582ac08d071d6b91ce70df06fb7d9491

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page