A PDF file reading server based on FastMCP. Supports PDF text extraction, OCR recognition, and image extraction via the MCP protocol, with a built-in web debugger for easy testing.

Project description

📄 MCP PDF Server

A PDF file reading server based on FastMCP.

Supports PDF text extraction, OCR recognition, and image extraction via the MCP protocol, with a built-in web debugger for easy testing.

🚀 Features

read_pdf_text
Extracts normal text from a PDF (page by page).
read_by_ocr
Uses OCR to recognize text from scanned or image-based PDFs.
read_pdf_images
Extracts all images from a specified PDF page (Base64 encoded output).

📂 Project Structure

mcp-pdf-server/
├── pdf_server.py         # Main server entry point
└── README.md             # Project documentation

⚙️ Installation

Recommended Python version: 3.9+

pip install pymupdf mcp

Note: To use OCR features, you may need a MuPDF build with OCR support or external OCR libraries.

🤖 Configuration

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "timeout": 60000,
      "args": [
        "mcp-pdf-reader"
      ]
    }
  }
}

🔦 Start the Server

Run the following command:

python pdf_server.py

You should see logs like:

INFO:mcp-pdf-server:Starting MCP PDF Server...

🛠️ API Tool List

Tool	Description	Input Parameters	Returns
`read_pdf_text`	Extracts normal text from PDF pages	`file_path`, `start_page`, `end_page`	List of page texts
`read_by_ocr`	Recognizes text via OCR	`file_path`, `start_page`, `end_page`, `language`, `dpi`	OCR extracted text
`read_pdf_images`	Extracts images from a PDF page	`file_path`, `page_number`	List of images (Base64 encoded)

📝 Example Usage

Extract text from pages 1 to 5:

mcp run read_pdf_text --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 5}'

Perform OCR recognition on page 1:

mcp run read_by_ocr --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 1, "language": "eng"}'

Extract all images from page 3:

mcp run read_pdf_images --args '{"file_path": "pdf_resources/example.pdf", "page_number": 3}'

📢 Notes

Files must be placed inside the pdf_resources/ directory, or an absolute path must be provided.
OCR functionality requires appropriate OCR support in the environment.
When processing large files, adjust memory and timeout settings as needed.

📜 License

This project is licensed under the MIT License.
For commercial use, please credit the original source.

Project details

Release history Release notifications | RSS feed

0.1.7

Jul 31, 2025

0.1.6

Jul 31, 2025

0.1.5

Jul 22, 2025

0.1.4

Jul 16, 2025

This version

0.1.3

Jul 16, 2025

0.1.2

Jul 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_pdf_reader-0.1.3.tar.gz (3.6 kB view details)

Uploaded Jul 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_pdf_reader-0.1.3-py3-none-any.whl (4.1 kB view details)

Uploaded Jul 16, 2025 Python 3

File details

Details for the file mcp_pdf_reader-0.1.3.tar.gz.

File metadata

Download URL: mcp_pdf_reader-0.1.3.tar.gz
Upload date: Jul 16, 2025
Size: 3.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_pdf_reader-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`2f7e6b8494136e64f8507b865bc2acb2fd61deee553040b4bb21a8dbfe1460a3`
MD5	`518728362ede8c8c146fe630049bb0b6`
BLAKE2b-256	`e5abaa2930601cc64a4059dc556923001f712909bfd42e0d00f50af357d4b11d`

See more details on using hashes here.

File details

Details for the file mcp_pdf_reader-0.1.3-py3-none-any.whl.

File metadata

Download URL: mcp_pdf_reader-0.1.3-py3-none-any.whl
Upload date: Jul 16, 2025
Size: 4.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_pdf_reader-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c666e9f178796ffa70f92e8b28d9934659d79eca7d419ae76e304c4170ac0a0`
MD5	`cec8544a4fd109c981d6516206d97dde`
BLAKE2b-256	`5180b8ed47e03edf4ee3a1f3f5b568916b3ca65d8cc31237e1a85f7e9752e4a3`

See more details on using hashes here.

mcp-pdf-reader 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

📄 MCP PDF Server

🚀 Features

📂 Project Structure

⚙️ Installation

🤖 Configuration

🔦 Start the Server

🛠️ API Tool List

📝 Example Usage

📢 Notes

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes