Skip to main content

A PDF file reading server based on FastMCP. Supports PDF text extraction, OCR recognition, and image extraction via the MCP protocol, with a built-in web debugger for easy testing.

Project description

📄 MCP PDF Server

A PDF file reading server based on FastMCP.

Supports PDF text extraction, OCR recognition, and image extraction via the MCP protocol, with a built-in web debugger for easy testing.


🚀 Features

  • read_pdf_text
    Extracts normal text from a PDF (page by page).

  • read_by_ocr
    Uses OCR to recognize text from scanned or image-based PDFs.

  • read_pdf_to_file
    Converts PDF files to TXT files with optional OCR support.


📂 Project Structure

mcp-pdf-server/
├── pdf_server.py         # Main server entry point
└── README.md             # Project documentation

⚙️ Installation

Recommended Python version: 3.9+

pip install pymupdf mcp

Note: To use OCR features, you may need a MuPDF build with OCR support or external OCR libraries.

🤖 Configuration

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "timeout": 60000,
      "args": [
        "mcp-pdf-reader"
      ]
    }
  }
}

🔦 Start the Server

Run the following command:

python pdf_server.py

You should see logs like:

INFO:mcp-pdf-server:Starting MCP PDF Server...

🛠️ API Tool List

Tool Description Input Parameters Returns
read_pdf_text Extracts normal text from PDF pages file_path, start_page, end_page List of page texts
read_by_ocr Recognizes text via OCR file_path, start_page, end_page, language, dpi OCR extracted text
read_pdf_to_file Converts PDF files to TXT files file_paths, use_ocr, language, dpi Dictionary of generated TXT file paths

📝 Example Usage

Extract text from pages 1 to 5:

mcp run read_pdf_text --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 5}'

Perform OCR recognition on page 1:

mcp run read_by_ocr --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 1, "language": "eng"}'

Convert PDF files to TXT files:

mcp run read_pdf_to_file --args '{"file_paths": ["pdf_resources/example.pdf"], "use_ocr": "no"}'

Convert PDF files with OCR support:

mcp run read_pdf_to_file --args '{"file_paths": ["pdf_resources/example.pdf"], "use_ocr": "yes", "language": "eng", "dpi": 300}'

📢 Notes

  • Files must be placed inside the pdf_resources/ directory, or an absolute path must be provided.
  • OCR functionality requires appropriate OCR support in the environment.
  • When processing large files, adjust memory and timeout settings as needed.

📜 License

This project is licensed under the MIT License.
For commercial use, please credit the original source.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_pdf_reader-0.1.6.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_pdf_reader-0.1.6-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_pdf_reader-0.1.6.tar.gz.

File metadata

  • Download URL: mcp_pdf_reader-0.1.6.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_pdf_reader-0.1.6.tar.gz
Algorithm Hash digest
SHA256 a7264e0101996dddf9c80504184b49d8a3e7d5cf8880bc61f42dac00c27a6480
MD5 30af1826416f48af453e150cadac6c9c
BLAKE2b-256 60266037826e560dee8a88667d539fa2fea01baf39f91c9a2cccd3eb7d587d02

See more details on using hashes here.

File details

Details for the file mcp_pdf_reader-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: mcp_pdf_reader-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_pdf_reader-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 bc54b2f6e130061fd23d314de11ccf9d10859c64caa07d9f75132cb1eef0c759
MD5 424add07c88dc611b71f05fc5ba9d058
BLAKE2b-256 1debc0a893f00f561a639d866ccb634b9162e8e7285e06d91a3fd8c3abe49f35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page