MCP server for reading PDF, Excel, and Word documents
Project description
MCP Doc Reader
A Model Context Protocol (MCP) server that enables AI assistants to read and extract content from PDF, Excel, and Word documents.
Features
- PDF Reading: Extract text content from PDF files using
pdfminer.six - Excel Reading: Read
.xlsxand.xlsfiles with formatted table output - Word Reading: Extract text and tables from
.docxfiles - Cross-Platform: Works on Windows, Linux, and macOS
- Unicode Support: Full support for non-ASCII characters (Chinese, Japanese, etc.)
Installation
Using uvx (Recommended)
uvx mcp-doc-reader
Using pip
pip install mcp-doc-reader
From Source
git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e .
Configuration
Add the following to your MCP client configuration (e.g., Claude Desktop, Cursor):
Option 1: Using uvx (Recommended)
{
"mcpServers": {
"DocReader": {
"command": "uvx",
"args": ["mcp-doc-reader"]
}
}
}
Option 2: Using pip-installed command
{
"mcpServers": {
"DocReader": {
"command": "mcp-doc-reader"
}
}
}
Option 3: Windows with Unicode Support
For Windows systems with non-ASCII file paths (e.g., Chinese characters):
{
"mcpServers": {
"DocReader": {
"command": "cmd",
"args": [
"/c",
"chcp 65001 >nul && uvx mcp-doc-reader"
]
}
}
}
Option 4: Linux/macOS with Python module
{
"mcpServers": {
"DocReader": {
"command": "python",
"args": ["-m", "docreader"]
}
}
}
Available Tools
read_pdf
Read text content from a PDF file.
Parameters:
file_path(string, required): Absolute path to the PDF file
Example:
{
"name": "read_pdf",
"arguments": {
"file_path": "/path/to/document.pdf"
}
}
read_excel
Read content from an Excel file (.xlsx or .xls).
Parameters:
file_path(string, required): Absolute path to the Excel file
Example:
{
"name": "read_excel",
"arguments": {
"file_path": "/path/to/spreadsheet.xlsx"
}
}
read_word
Read text content from a Word file (.docx).
Parameters:
file_path(string, required): Absolute path to the Word file
Example:
{
"name": "read_word",
"arguments": {
"file_path": "/path/to/document.docx"
}
}
Usage Examples
Once configured, you can ask your AI assistant to:
- "Read the contents of /path/to/report.pdf"
- "Extract data from /path/to/data.xlsx"
- "What does the document /path/to/memo.docx contain?"
Development
Setup Development Environment
git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e ".[dev]"
Run Tests
pytest
Build Package
pip install build
python -m build
Publish to PyPI
pip install twine
twine upload dist/*
Project Structure
mcp-doc-reader/
├── src/
│ └── docreader/
│ ├── __init__.py
│ ├── __main__.py
│ ├── server.py
│ └── readers/
│ ├── __init__.py
│ ├── pdf_reader.py
│ ├── excel_reader.py
│ └── word_reader.py
├── examples/
│ ├── mcp_config_pip.json
│ ├── mcp_config_uvx.json
│ ├── mcp_config_windows.json
│ └── mcp_config_linux.json
├── pyproject.toml
├── README.md
└── LICENSE
Troubleshooting
Windows: Unicode/Chinese filename issues
If you encounter issues with non-ASCII characters in file paths on Windows, use the Windows-specific configuration that sets the code page to UTF-8:
{
"mcpServers": {
"DocReader": {
"command": "cmd",
"args": ["/c", "chcp 65001 >nul && mcp-doc-reader"]
}
}
}
.doc files not supported
The Word reader only supports .docx format. To read .doc files, please convert them to .docx first using Microsoft Word or LibreOffice.
License
MIT License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_doc_reader-1.0.0.tar.gz.
File metadata
- Download URL: mcp_doc_reader-1.0.0.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93f1f4fd31a66a134be75841adab55f83c5fd5437471b8f3aa4807189b546832
|
|
| MD5 |
cbb070ff10dd333a856cc40b66ed6e9c
|
|
| BLAKE2b-256 |
a9faee6b7eecde2e5d52309090269016f1ad79682231c597bca8dcead4d63062
|
File details
Details for the file mcp_doc_reader-1.0.0-py3-none-any.whl.
File metadata
- Download URL: mcp_doc_reader-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f3333e10db42aaf6ed30bdfc634c4a58f3c4e19bd5710f92014eded5a4c8027
|
|
| MD5 |
ead3ad8cf688d71e2baa4bb8307b73c5
|
|
| BLAKE2b-256 |
662e9176f25169f088432548fa642f653151096ba2ee01b49afeede2014a79ef
|