Skip to main content

An MCP enabled multi-format document reader supporting DOCX, PDF, TXT, and Excel files

Project description

MCP Document Reader

MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.

🌐 Language: English | 中文

CSDN GitHub Gitee

License Python PyPI Version PyPI Downloads MCP Registry MCP Marketplace

Features

  • Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
  • MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
  • Easy Integration: Simple configuration for immediate use
  • Reliable Performance: Successfully tested and running in Trae IDE
  • File System Support: Reads documents directly from the file system

📚 Documentation

User Guide · API Reference · Contributing · Changelog · License


Architecture

graph TB
    A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
    B -->|Detect file type| C{File Type?}
    C -->|.docx| D[DOCX Reader]
    C -->|.pdf| E[PDF Reader]
    C -->|.xlsx/.xls| F[Excel Reader]
    C -->|.txt| G[Text Reader]
    D -->|Extract text| H[Return Content]
    E -->|Extract text| H
    F -->|Extract text| H
    G -->|Extract text| H
    H -->|Text content| A
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#f0f0f0
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#e8f5e9
    style H fill:#fff9c4

Supported Formats

Format Extensions MIME Type Features
Excel .xlsx, .xls application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Sheet and cell data extraction
DOCX .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document Text and structure extraction
PDF .pdf application/pdf Text extraction
Text .txt text/plain Plain text reading

Installation

Using pip (Recommended)

pip install mcp-documents-reader

From Source

git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .

MCP Tools

This server provides the following tool:

read_document

Read any supported document type with a unified interface.

Arguments:

  • filename (string, required): Document file path, supports absolute or relative paths.

Configuration

Using in Trae IDE / Claude Desktop

Add the following to your MCP configuration file:

Option 1: Using PyPI (Recommended)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "mcp-documents-reader"
      ]
    }
  }
}

Option 2: Using GitHub repository

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Option 3: Using Gitee repository (Faster access in China)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://gitee.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Usage

As an MCP Tool

After configuration, AI assistants can directly call the following tool:

# Read a DOCX file
read_document(filename="example.docx")

# Read a PDF file
read_document(filename="example.pdf")

# Read an Excel file
read_document(filename="example.xlsx")

# Read a text file
read_document(filename="example.txt")

As a Python Library

from mcp_documents_reader import DocumentReaderFactory

# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")

# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
    reader = DocumentReaderFactory.get_reader("file.xlsx")
    content = reader.read("/path/to/file.xlsx")

Tool Interface Details

read_document

Read any supported document type.

Parameters:

Parameter Type Required Description
filename string Document file path, supports absolute or relative paths

Dependencies

Core Dependencies

  • mcp >= 1.26.0 - MCP protocol implementation
  • python-docx >= 1.2.0 - DOCX file reading
  • pypdf >= 6.8.0 - PDF file reading (replaces PyPDF2)
  • openpyxl >= 3.1.5 - Excel file reading

Development Dependencies

  • pytest >= 8.0.0 - Testing framework
  • pytest-asyncio >= 0.24.0 - Async testing support
  • pytest-cov >= 6.0.0 - Coverage reporting
  • basedpyright >= 0.28.0 - Type checking
  • ruff >= 0.8.0 - Linting and formatting

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_documents_reader-1.3.1.tar.gz (203.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_documents_reader-1.3.1-py3-none-any.whl (220.9 kB view details)

Uploaded Python 3

File details

Details for the file mcp_documents_reader-1.3.1.tar.gz.

File metadata

  • Download URL: mcp_documents_reader-1.3.1.tar.gz
  • Upload date:
  • Size: 203.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_documents_reader-1.3.1.tar.gz
Algorithm Hash digest
SHA256 fb9ae5cf40e913f7625b97acdb409505a18668dff30929692289b024f07c05b9
MD5 a07a42bedbb90e7a864120fc8452ce6e
BLAKE2b-256 4d7298cf9e9a832e602d84441c3707b3d595eeeb49cf7ecc0c71ed357d0b4a79

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.3.1.tar.gz:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_documents_reader-1.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_documents_reader-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8974a1b5aead7983507adf210baad4bb5bc56eb70d75ab20f40c077189e09a8b
MD5 a7704c1302f54e554a50c8e7604c6d27
BLAKE2b-256 30e18391765583d1c1640c1662ecb6815d70d844b02c8c7cfe6930b8e3b93ece

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.3.1-py3-none-any.whl:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page