Skip to main content

An MCP enabled multi-format document reader supporting DOCX, PDF, TXT, and Excel files

Project description

MCP Document Reader

MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.

🌐 Language: English | 中文

CSDN GitHub Gitee

License Python PyPI Version PyPI Downloads MCP Registry MCP Marketplace

Features

  • Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
  • MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
  • Easy Integration: Simple configuration for immediate use
  • Reliable Performance: Successfully tested and running in Trae IDE
  • File System Support: Reads documents directly from the file system

📚 Documentation

User Guide · API Reference · Contributing · Changelog · License


Architecture

graph TB
    A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
    B -->|Detect file type| C{File Type?}
    C -->|.docx| D[DOCX Reader]
    C -->|.pdf| E[PDF Reader]
    C -->|.xlsx/.xls| F[Excel Reader]
    C -->|.txt| G[Text Reader]
    D -->|Extract text| H[Return Content]
    E -->|Extract text| H
    F -->|Extract text| H
    G -->|Extract text| H
    H -->|Text content| A
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#f0f0f0
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#e8f5e9
    style H fill:#fff9c4

Supported Formats

Format Extensions MIME Type Features
Excel .xlsx, .xls application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Sheet and cell data extraction
DOCX .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document Text and structure extraction
PDF .pdf application/pdf Text extraction
Text .txt text/plain Plain text reading

Installation

Using pip (Recommended)

pip install mcp-documents-reader

From Source

git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .

MCP Tools

This server provides the following tool:

read_document

Read any supported document type with a unified interface.

Arguments:

  • filename (string, required): Document file path, supports absolute or relative paths.

Configuration

Using in Trae IDE / Claude Desktop

Add the following to your MCP configuration file:

Option 1: Using PyPI (Recommended)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "mcp-documents-reader"
      ]
    }
  }
}

Option 2: Using GitHub repository

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Option 3: Using Gitee repository (Faster access in China)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://gitee.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Usage

As an MCP Tool

After configuration, AI assistants can directly call the following tool:

# Read a DOCX file
read_document(filename="example.docx")

# Read a PDF file
read_document(filename="example.pdf")

# Read an Excel file
read_document(filename="example.xlsx")

# Read a text file
read_document(filename="example.txt")

As a Python Library

from mcp_documents_reader import DocumentReaderFactory

# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")

# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
    reader = DocumentReaderFactory.get_reader("file.xlsx")
    content = reader.read("/path/to/file.xlsx")

Tool Interface Details

read_document

Read any supported document type.

Parameters:

Parameter Type Required Description
filename string Document file path, supports absolute or relative paths

Environment Variables

Variable Description Default
DOCUMENT_DIRECTORY Directory where documents are stored ./documents

Dependencies

Core Dependencies

  • mcp >= 0.1.0 - MCP protocol implementation
  • python-docx >= 0.8.11 - DOCX file reading
  • PyPDF2 >= 3.0.1 - PDF file reading
  • openpyxl >= 3.0.10 - Excel file reading

Development Dependencies

  • pytest >= 8.0.0 - Testing framework
  • pytest-asyncio >= 0.24.0 - Async testing support
  • pytest-cov >= 6.0.0 - Coverage reporting
  • basedpyright >= 0.28.0 - Type checking
  • ruff >= 0.8.0 - Linting and formatting

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_documents_reader-1.1.0.tar.gz (203.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_documents_reader-1.1.0-py3-none-any.whl (221.8 kB view details)

Uploaded Python 3

File details

Details for the file mcp_documents_reader-1.1.0.tar.gz.

File metadata

  • Download URL: mcp_documents_reader-1.1.0.tar.gz
  • Upload date:
  • Size: 203.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_documents_reader-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a0314f7a91393cabd4c17023290d6ebe711ef225751cfe2bd4aab0921d55df56
MD5 1bb4b8f32ea8ba80a853273052cf496d
BLAKE2b-256 469e04a9553e2bf16c14a31177ade608d2c664dab7a1a541687a35f7f364425b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.1.0.tar.gz:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_documents_reader-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_documents_reader-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6c8193e51d0e6b30bedf24026dc165d03315cdbb7fd3039d13443a965309607
MD5 75fe655aeb66638914e6d4b040d06c83
BLAKE2b-256 c17fee992069ed7fd8b91a25710522b10e18c56fe684fc730e72a819c29f694b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_documents_reader-1.1.0-py3-none-any.whl:

Publisher: release.yml on xt765/mcp_documents_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page