Skip to main content

An MCP enabled multi-format document reader supporting DOCX, PDF, TXT, and Excel files

Project description

MCP Document Reader

MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.

🌐 Language: English | 中文

CSDN GitHub Gitee

License Python PyPI Version PyPI Downloads MCP Registry MCP Marketplace

Features

  • Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
  • MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
  • Easy Integration: Simple configuration for immediate use
  • Reliable Performance: Successfully tested and running in Trae IDE
  • File System Support: Reads documents directly from the file system

📚 Documentation

User Guide · API Reference · Contributing · Changelog · License


Architecture

graph TB
    A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
    B -->|Detect file type| C{File Type?}
    C -->|.docx| D[DOCX Reader]
    C -->|.pdf| E[PDF Reader]
    C -->|.xlsx/.xls| F[Excel Reader]
    C -->|.txt| G[Text Reader]
    D -->|Extract text| H[Return Content]
    E -->|Extract text| H
    F -->|Extract text| H
    G -->|Extract text| H
    H -->|Text content| A
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#f0f0f0
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#e8f5e9
    style H fill:#fff9c4

Supported Formats

Format Extensions MIME Type Features
Excel .xlsx, .xls application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Sheet and cell data extraction
DOCX .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document Text and structure extraction
PDF .pdf application/pdf Text extraction
Text .txt text/plain Plain text reading

Installation

Using pip (Recommended)

pip install mcp-documents-reader

From Source

git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .

MCP Tools

This server provides the following tool:

read_document

Read any supported document type with a unified interface.

Arguments:

  • filename (string, required): Document file path, supports absolute or relative paths.

Configuration

Using in Trae IDE / Claude Desktop

Add the following to your MCP configuration file:

Option 1: Using PyPI (Recommended)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "mcp-documents-reader"
      ]
    }
  }
}

Option 2: Using GitHub repository

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Option 3: Using Gitee repository (Faster access in China)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://gitee.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Usage

As an MCP Tool

After configuration, AI assistants can directly call the following tool:

# Read a DOCX file
read_document(filename="example.docx")

# Read a PDF file
read_document(filename="example.pdf")

# Read an Excel file
read_document(filename="example.xlsx")

# Read a text file
read_document(filename="example.txt")

As a Python Library

from mcp_documents_reader import DocumentReaderFactory

# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")

# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
    reader = DocumentReaderFactory.get_reader("file.xlsx")
    content = reader.read("/path/to/file.xlsx")

Tool Interface Details

read_document

Read any supported document type.

Parameters:

Parameter Type Required Description
filename string Document file path, supports absolute or relative paths

Environment Variables

Variable Description Default
DOCUMENT_DIRECTORY Directory where documents are stored ./documents

Dependencies

Core Dependencies

  • mcp >= 1.23.0 - MCP protocol implementation
  • python-docx >= 1.2.0 - DOCX file reading
  • pypdf >= 6.7.1 - PDF file reading (replaces PyPDF2)
  • openpyxl >= 3.1.5 - Excel file reading

Development Dependencies

  • pytest >= 8.0.0 - Testing framework
  • pytest-asyncio >= 0.24.0 - Async testing support
  • pytest-cov >= 6.0.0 - Coverage reporting
  • basedpyright >= 0.28.0 - Type checking
  • ruff >= 0.8.0 - Linting and formatting

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_documents_reader-1.2.0.tar.gz (205.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_documents_reader-1.2.0-py3-none-any.whl (223.8 kB view details)

Uploaded Python 3

File details

Details for the file mcp_documents_reader-1.2.0.tar.gz.

File metadata

  • Download URL: mcp_documents_reader-1.2.0.tar.gz
  • Upload date:
  • Size: 205.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_documents_reader-1.2.0.tar.gz
Algorithm Hash digest
SHA256 bd625b016258913332d6065bb0b58b80f3559b4c33d9eab86a7ecd9c5f5e9f9b
MD5 9d1bd406051b86ffc4efd764816e49dd
BLAKE2b-256 6accdd99bd103c9e02460565302200592ce499dd9c94d753c31f8402de7554fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.2.0.tar.gz:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_documents_reader-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_documents_reader-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97a848c469a56260954a56a0ce8034949b247f39f128ec0b32bc4e584fe71c1b
MD5 c7880c2ba0ec9eb79790a950042c9b01
BLAKE2b-256 628154163ec4c3c9edde5e9456ac41a44adb2bf2645d181c6d4fc74f8f8b65cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.2.0-py3-none-any.whl:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page