An MCP enabled multi-format document reader supporting DOCX, PDF, TXT, and Excel files
Project description
MCP Document Reader
MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.
Features
- Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
- MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
- Easy Integration: Simple configuration for immediate use
- Reliable Performance: Successfully tested and running in Trae IDE
- File System Support: Reads documents directly from the file system
📚 Documentation
User Guide · API Reference · Contributing · Changelog · License
Architecture
graph TB
A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
B -->|Detect file type| C{File Type?}
C -->|.docx| D[DOCX Reader]
C -->|.pdf| E[PDF Reader]
C -->|.xlsx/.xls| F[Excel Reader]
C -->|.txt| G[Text Reader]
D -->|Extract text| H[Return Content]
E -->|Extract text| H
F -->|Extract text| H
G -->|Extract text| H
H -->|Text content| A
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#f0f0f0
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#e8f5e9
style G fill:#e8f5e9
style H fill:#fff9c4
Supported Formats
| Format | Extensions | MIME Type | Features |
|---|---|---|---|
| Excel | .xlsx, .xls | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Sheet and cell data extraction |
| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Text and structure extraction |
| application/pdf | Text extraction | ||
| Text | .txt | text/plain | Plain text reading |
Installation
Using pip (Recommended)
pip install mcp-documents-reader
From Source
git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .
MCP Tools
This server provides the following tool:
read_document
Read any supported document type with a unified interface.
Arguments:
filename(string, required): Document file path, supports absolute or relative paths.
Configuration
Using in Trae IDE / Claude Desktop
Add the following to your MCP configuration file:
Option 1: Using PyPI (Recommended)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"mcp-documents-reader"
]
}
}
}
Option 2: Using GitHub repository
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}
Option 3: Using Gitee repository (Faster access in China)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://gitee.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}
Usage
As an MCP Tool
After configuration, AI assistants can directly call the following tool:
# Read a DOCX file
read_document(filename="example.docx")
# Read a PDF file
read_document(filename="example.pdf")
# Read an Excel file
read_document(filename="example.xlsx")
# Read a text file
read_document(filename="example.txt")
As a Python Library
from mcp_documents_reader import DocumentReaderFactory
# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")
# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
reader = DocumentReaderFactory.get_reader("file.xlsx")
content = reader.read("/path/to/file.xlsx")
Tool Interface Details
read_document
Read any supported document type.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| filename | string | ✅ | Document file path, supports absolute or relative paths |
Dependencies
Core Dependencies
mcp>= 1.26.0 - MCP protocol implementationpython-docx>= 1.2.0 - DOCX file readingpypdf>= 6.8.0 - PDF file reading (replaces PyPDF2)openpyxl>= 3.1.5 - Excel file reading
Development Dependencies
pytest>= 8.0.0 - Testing frameworkpytest-asyncio>= 0.24.0 - Async testing supportpytest-cov>= 6.0.0 - Coverage reportingbasedpyright>= 0.28.0 - Type checkingruff>= 0.8.0 - Linting and formatting
License
MIT License
Contributing
Issues and Pull Requests are welcome!
Related Projects
- MCP Document Converter - MCP document converter supporting multiple format conversions
- Model Context Protocol - Official Model Context Protocol documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_documents_reader-1.3.1.tar.gz.
File metadata
- Download URL: mcp_documents_reader-1.3.1.tar.gz
- Upload date:
- Size: 203.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb9ae5cf40e913f7625b97acdb409505a18668dff30929692289b024f07c05b9
|
|
| MD5 |
a07a42bedbb90e7a864120fc8452ce6e
|
|
| BLAKE2b-256 |
4d7298cf9e9a832e602d84441c3707b3d595eeeb49cf7ecc0c71ed357d0b4a79
|
Provenance
The following attestation bundles were made for mcp_documents_reader-1.3.1.tar.gz:
Publisher:
release.yml on xt765/mcp_documents_reader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_documents_reader-1.3.1.tar.gz -
Subject digest:
fb9ae5cf40e913f7625b97acdb409505a18668dff30929692289b024f07c05b9 - Sigstore transparency entry: 1093963907
- Sigstore integration time:
-
Permalink:
xt765/mcp_documents_reader@5785cf7e9ce16ccbbad6cd0d277375cea5ec6d28 -
Branch / Tag:
refs/tags/v1.3.1 - Owner: https://github.com/xt765
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5785cf7e9ce16ccbbad6cd0d277375cea5ec6d28 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mcp_documents_reader-1.3.1-py3-none-any.whl.
File metadata
- Download URL: mcp_documents_reader-1.3.1-py3-none-any.whl
- Upload date:
- Size: 220.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8974a1b5aead7983507adf210baad4bb5bc56eb70d75ab20f40c077189e09a8b
|
|
| MD5 |
a7704c1302f54e554a50c8e7604c6d27
|
|
| BLAKE2b-256 |
30e18391765583d1c1640c1662ecb6815d70d844b02c8c7cfe6930b8e3b93ece
|
Provenance
The following attestation bundles were made for mcp_documents_reader-1.3.1-py3-none-any.whl:
Publisher:
release.yml on xt765/mcp_documents_reader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_documents_reader-1.3.1-py3-none-any.whl -
Subject digest:
8974a1b5aead7983507adf210baad4bb5bc56eb70d75ab20f40c077189e09a8b - Sigstore transparency entry: 1093963910
- Sigstore integration time:
-
Permalink:
xt765/mcp_documents_reader@5785cf7e9ce16ccbbad6cd0d277375cea5ec6d28 -
Branch / Tag:
refs/tags/v1.3.1 - Owner: https://github.com/xt765
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5785cf7e9ce16ccbbad6cd0d277375cea5ec6d28 -
Trigger Event:
release
-
Statement type: