Skip to main content

Document Question-Answering System with MCP Integration

Project description

DocsRay

PyPI Status license Downloads Verified on MseeP

🌐 Live Demo (Base Model)

A powerful Universal Document Question-Answering System that uses advanced embedding models and multimodal LLMs with Coarse-to-Fine search (RAG) approach. Features seamless MCP (Model Context Protocol) integration with Claude Desktop, comprehensive directory management capabilities, visual content analysis, and intelligent hybrid OCR system.

🚀 Quick Start

DocsRay now features automatic setup! Simply install and it will handle dependencies and download the lite model automatically.

# Install DocsRay
pip install docsray

That's it! DocsRay will automatically:

  • Install system dependencies
  • Download the lite model (~3GB)
  • Configure the environment

Manual Setup (if automatic setup fails)

If the automatic setup doesn't work properly, you can run the setup manually:

# 1. Install DocsRay
pip install docsray

# 2. Run manual setup
docsray setup

# 3. Download models (default: lite)
docsray download-models --model-type lite   # 4b model (~3GB)
# docsray download-models --model-type base  # 12b model (~8GB) 
# docsray download-models --model-type pro   # 27b model (~16GB)

Optional Components

# 1. Tesseract OCR (for enhanced OCR performance)
# Ubuntu/Debian: sudo apt-get install tesseract-ocr tesseract-ocr-kor
# macOS: brew install tesseract tesseract-lang

# 2. ffmpeg for Audio/Video processing (recommended)
# macOS: brew install ffmpeg
# Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
# Windows: Download from https://ffmpeg.org/download.html

# 3. CUDA support for faster processing
# CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.3.9 --upgrade --force-reinstall --no-cache-dir

# 4. Configure Claude Desktop integration
docsray configure-claude

Start Using DocsRay

docsray web                                 # Launch Web UI
docsray api                                 # Start API server

📋 Core Features

  • 🧠 Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
  • 👁️ Multimodal AI: Visual content analysis using Gemma-3 vision capabilities
  • 🔄 Hybrid OCR: Intelligent selection between AI-powered OCR and Pytesseract
  • ⚡ Adaptive Performance: Automatically optimizes based on system resources
  • 🎯 Flexible Model Selection: Choose between lite (4b), base (12b), and pro (27b) models
  • 🔌 MCP Integration: Seamless integration with Claude Desktop
  • 🌐 Multiple Interfaces: Web UI, API server, CLI, and MCP server
  • 📁 Universal Document Support: 30+ file formats with automatic conversion
  • 🌍 Multi-Language: Korean, English, and other languages supported

🎯 What's New in v1.8.0

Video and Audio Input Support

  • Video Processing: Extract and analyze content from video files
    • Automatic audio extraction from video formats
    • Frame extraction for visual content analysis
    • Support for MP4, AVI, MOV, and other common formats
  • Audio Processing: Direct transcription and analysis of audio files
    • Speech-to-text using faster-whisper
    • Support for MP3, WAV, M4A, and other audio formats
  • Multimedia Pipeline: Unified processing for all media types
  • Automatic Setup: DocsRay now automatically installs dependencies and downloads models on first run

📰 Recent Updates

v1.7.1

Auto-Restart and Timeout Features

  • Auto-Restart Support: Web, API, and MCP servers now support automatic restart on crashes
  • Optional Timeout: --timeout parameter only applies when explicitly specified
  • Optional Page Limits: --pages parameter only applies when explicitly specified
  • Request Timeout for API: API server can auto-restart if request processing exceeds timeout
  • Unlimited Retries: --max-retries is optional; if not specified, servers will retry indefinitely

v1.7.0: Breaking Change - Enhanced Embedding Method

  • Improved Embedding Synthesis: Changed from element-wise addition to concatenation
  • IMPORTANT: This change requires reindexing of existing documents
  • Better Accuracy: Concatenation preserves more information from both embedding models

📖 Usage Guide

Model Management

# Download specific model type
docsray download-models --model-type lite   # Fast, lower quality
docsray download-models --model-type base   # Balanced performance
docsray download-models --model-type pro    # Best quality, slower

# Force re-download existing models
docsray download-models --model-type base --force

# Check model status
docsray download-models --check

Document Processing

# Process any document type
docsray process document.pdf --model-type base
docsray process report.docx --timeout 300
docsray process spreadsheet.xlsx --no-visuals

# Ask questions about processed documents
docsray ask document.pdf "What are the key findings?"
docsray ask report.docx "Summarize the conclusions" --model-type pro

Web Interface

# Basic web interface
docsray web

# Advanced options
docsray web --model-type base --port 8080
docsray web --auto-restart                    # Auto-restart with unlimited retries
docsray web --auto-restart --max-retries 5    # Auto-restart with 5 retry limit
docsray web --timeout 300 --pages 10          # Process max 10 pages, 5min timeout

API Server

# Start API server
docsray api --port 8000

# With auto-restart and timeout
docsray api --auto-restart                     # Unlimited retries
docsray api --auto-restart --timeout 600       # 10min timeout per request

# API accepts document paths per request
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "document_path": "/path/to/document.pdf",
    "question": "What is the main topic?",
    "use_coarse_search": true
  }'

# Check cache info and clear if needed
curl http://localhost:8000/cache/info
curl -X POST http://localhost:8000/cache/clear

Performance Testing

# Basic performance test
docsray perf-test document.pdf "What is this about?"

# Advanced testing
docsray perf-test document.pdf "Analyze key points" \
  --iterations 5 --port 8000 --host localhost

MCP Integration (Claude Desktop)

# Configure Claude Desktop
docsray configure-claude

# Start MCP server
docsray mcp --auto-restart

📁 Supported File Formats

Office Documents: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
Text Formats: Plain Text (.txt), Markdown (.md), HTML (.html)
Images: JPEG, PNG, GIF, BMP, TIFF, WebP
Korean Documents: HWP (.hwp, .hwpx)
PDFs: Native PDF support with visual analysis
Audio: MP3, WAV, M4A, FLAC, OGG, WMA, AAC (requires ffmpeg)
Video: MP4, AVI, MOV, WMV, FLV, MKV, WebM, M4V, MPG, MPEG (requires ffmpeg)

🛠️ Advanced Configuration

Environment Variables

export DOCSRAY_MODEL_TYPE=base           # Set default model type
export DOCSRAY_DISABLE_VISUALS=1         # Disable visual analysis
export DOCSRAY_DEBUG=1                   # Enable debug mode
export DOCSRAY_HOME=/custom/path         # Custom data directory

Python API

from docsray import PDFChatBot
from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builder

# Process document
extracted = pdf_extractor.extract_content("document.pdf", analyze_visuals=True)
chunks = chunker.process_extracted_file(extracted)
chunk_index = build_index.build_chunk_index(chunks)
sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)

# Create chatbot and ask questions
chatbot = PDFChatBot(sections, chunk_index)
answer, references = chatbot.answer("What are the key points?")

🔧 System Requirements

Hardware Requirements

  • CPU Mode: Any system with 4GB+ RAM
  • GPU Acceleration: CUDA-compatible GPU or Apple Silicon (MPS)
  • Storage: 3-16GB depending on model type chosen

Performance Modes (Auto-detected)

System Memory Mode Models Max Tokens
< 16GB FAST Q4 quantized 8K
16-32GB STANDARD Q8 quantized 16K
> 32GB FULL_FEATURE F16 precision 32K

🐛 Troubleshooting

Common Issues

# Check system status
docsray download-models --check

# Re-download corrupted models
docsray download-models --force

# Debug mode for detailed logs
DOCSRAY_DEBUG=1 docsray web

Performance Issues

  • Use --model-type lite for faster processing
  • Enable --no-visuals for text-only documents
  • Increase --timeout for large documents
  • Use auto-restart for stability: --auto-restart

📊 Performance Benchmarks

Run your own benchmarks:

# Test API performance
docsray perf-test document.pdf "test question" --iterations 10

# Compare model types
docsray perf-test document.pdf "test question" --model-type lite
docsray perf-test document.pdf "test question" --model-type base

🤝 Contributing

We welcome contributions! Please check our GitHub repository for:

  • Bug reports and feature requests
  • Code contributions and pull requests
  • Documentation improvements

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docsray-1.8.0.tar.gz (105.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsray-1.8.0-py3-none-any.whl (112.6 kB view details)

Uploaded Python 3

File details

Details for the file docsray-1.8.0.tar.gz.

File metadata

  • Download URL: docsray-1.8.0.tar.gz
  • Upload date:
  • Size: 105.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for docsray-1.8.0.tar.gz
Algorithm Hash digest
SHA256 3eda96923baa6c38ad35272c9df33bb97827bab2ad579cb1be6db889e1c6124d
MD5 6596faffffaebdaaf3d5fa9e769bc84d
BLAKE2b-256 22a908c4831bd9f0f16d97ff4b14a3aac02eb9faf0d6cffcd112f310b2b2f3a0

See more details on using hashes here.

File details

Details for the file docsray-1.8.0-py3-none-any.whl.

File metadata

  • Download URL: docsray-1.8.0-py3-none-any.whl
  • Upload date:
  • Size: 112.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for docsray-1.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64795615e3598039a903832cc1e2c9289948afc65540c3c61ee4a57958ef1555
MD5 e18a3e2b2dc217522ec41b52d05675ca
BLAKE2b-256 d4e0593ce24723ab4024c630dced4e4a3b078f2da1cceba5698b7fe8db0b8262

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page