Document Search MCP Server with extensible plugin architecture
Project description
Document Search MCP
A Model Context Protocol (MCP) server that provides intelligent document search across multiple sources, starting with Google Drive integration.
Overview
This MCP server enables AI assistants like Claude Desktop to search and retrieve documents from connected sources. It implements the official MCP protocol and provides a clean, extensible architecture for adding new document connectors.
Features
- ๐ Multi-source document search - Search across Google Drive documents, sheets, and presentations
- ๐ OAuth 2.0 authentication - Secure authentication with environment-based credentials
- ๐ Full content retrieval - Get complete document content for analysis
- ๐ Extensible plugin system - Ready framework for custom enhancements
- ๐๏ธ Modular architecture - Clean separation of connectors, models, and search orchestration
Quick Start
Prerequisites
- Python 3.11+
- Google OAuth 2.0 credentials (for Google Drive integration)
Installation
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone <repository-url>
cd document-search-mcp
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"
Configuration
- Set up Google OAuth credentials:
export GOOGLE_CLIENT_ID="your-client-id"
export GOOGLE_CLIENT_SECRET="your-client-secret"
- Configure Claude Desktop by adding to
~/.claude/mcp_servers.json:
{
"mcpServers": {
"document-search": {
"command": "document-search-mcp"
}
}
}
Usage
# Start the MCP server
document-search-mcp
# Or with debug logging
document-search-mcp --log-level DEBUG
MCP Tools
The server provides these MCP tools:
search_documents- Search across connected document sourcesget_document_content- Retrieve full content from documentslist_sources- Show configured document sources and statussetup_google_drive- OAuth setup and configuration wizard
Supported Document Sources
- โ Google Drive - Google Docs, Sheets, and Slides with OAuth 2.0
- ๐ง Confluence - Planned (connector interface ready)
- ๐ง SharePoint - Planned
- ๐ง Other sources - Framework ready for extension
Development
Running Tests
# Run all tests with coverage
pytest tests/ --cov=src --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Skip slow tests
Code Quality
# Type checking
mypy src/
# Linting and formatting
ruff check src/ # Lint check
ruff format src/ # Auto-format code
# Security scanning
bandit -r src/ # Security issues
safety check # Vulnerable dependencies
Adding New Document Connectors
Create a new connector by extending the base class:
from src.connectors.base_connector import DocumentConnector
from src.models.document import Document
class MySourceConnector(DocumentConnector):
def get_documents(self, options: dict[str, Any] | None = None) -> AsyncIterator[Document]:
# Implement async generator for document retrieval
yield document
async def get_document(self, document_id: str) -> Document:
# Implement single document retrieval
pass
async def search_documents(self, query: str, options: dict[str, Any] | None = None) -> list[DocumentMatch]:
# Implement search functionality
pass
Architecture
Core Components
- MCP Server (
src/server/mcp_server.py) - Main MCP protocol implementation - Document Connectors (
src/connectors/) - Modular interfaces for document sources - Search Orchestrator (
src/server/search_orchestrator.py) - Multi-source search coordination - Plugin System (
src/plugins/) - Extensible framework for enhancements - Data Models (
src/models/) - Document and search models with Pydantic validation
Project Structure
src/
โโโ main.py # CLI entry point with Click interface
โโโ models/ # Pydantic data models
โโโ connectors/ # Document source connectors
โ โโโ base_connector.py # Abstract base class
โ โโโ google_drive_connector.py # Google Drive implementation
โโโ server/ # MCP server implementation
โ โโโ mcp_server.py # Main MCP protocol handling
โ โโโ search_orchestrator.py # Multi-source coordination
โโโ plugins/ # Plugin system framework
โโโ base_plugin.py # Plugin interfaces
tests/
โโโ test_basic.py # Basic functionality tests
config/
โโโ config.yaml # Default configuration
โโโ config.yaml.local # Local development config
Configuration
The server uses environment-based configuration with automatic persistence:
- OAuth credentials: Set via environment variables (never hardcoded)
- Configuration file: Automatically saved to
~/.config/document-search-mcp/config.yaml - Setup wizard: Use the
setup_google_driveMCP tool for guided OAuth setup
Google Drive Setup Process
- Set environment variables with your Google OAuth credentials
- Use
setup_google_driveMCP tool withstep: "start" - Visit provided OAuth URL to authorize access
- Complete setup with
step: "complete"and redirect URL - Configuration persists automatically for future use
CI/CD Pipeline
The project uses GitLab CI with a PyPI publishing pipeline:
Pipeline Stages
- validate - Code quality checks (ruff, mypy, bandit, safety)
- build - Python package building
- test - Package integrity testing and unit tests
- publish - PyPI publishing (manual/tag-triggered)
Running Validation Locally
# Complete validation suite (matches CI)
ruff check src/
ruff format --check src/
mypy src/
bandit -r src/
safety check
Package Management
This project uses uv for fast Python package management:
# Development environment setup
uv venv
uv pip install -e ".[dev]"
# Package building
python -m build
# Validate package
python -c "import tomllib; tomllib.load(open('pyproject.toml', 'rb'))"
Current Implementation Status
โ Completed
- Complete MCP server implementation with Google Drive integration
- OAuth 2.0 authentication with environment-based credentials
- Document search and content retrieval across Google Docs/Sheets/Slides
- Extensible plugin architecture and data models
- Comprehensive test framework with markers and coverage
- GitLab CI/CD pipeline for Python package publishing
- Type safety with strict mypy configuration (all type errors resolved)
- Code formatting and linting with ruff
๐ง In Progress/Planned
- Additional document connectors (Confluence, SharePoint, etc.)
- Semantic search with vector embeddings
- Plugin implementations for specific domains
- Enhanced metadata extraction and filtering
- Web-based configuration interface
Contributing
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the full validation suite:
ruff check src/ && ruff format --check src/ && mypy src/ && bandit -r src/
- Submit a pull request
License
[Add your license information here]
Support
For issues and feature requests, please use the project's issue tracker.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file document_search_mcp-0.1.0.tar.gz.
File metadata
- Download URL: document_search_mcp-0.1.0.tar.gz
- Upload date:
- Size: 40.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73e2c2735b91eb06238779c5052e742e1fd19242ad2a8d9595c69dbfc3b5a456
|
|
| MD5 |
4e7014dfc3adba492cf4df36f98a6a45
|
|
| BLAKE2b-256 |
2b7a33f592b8117a686240d683c94d00adc98f1aa9fad02e9ca0002b34b2f0cb
|
File details
Details for the file document_search_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: document_search_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20f0b7143f3599de3ba0e613b243bdd0391a221966ba6bda70aaa1a3d436b54e
|
|
| MD5 |
b191d9750f76b2efa341024fddd97ec3
|
|
| BLAKE2b-256 |
cf5fdb08790c41eaed01b4daaa0663f3b1f6c97a2729db144c8ed8b4ef3dca33
|