A powerful MCP server for comprehensive PDF processing with OCR and diagram detection

These details have not been verified by PyPI

Project links

Project description

PDF Reader MCP Server

A powerful Model Context Protocol (MCP) server that provides comprehensive PDF processing capabilities, including text extraction, OCR support, and network diagram detection. Designed to work seamlessly with Amazon Q Developer CLI and other MCP-compatible systems.

🚀 Features

Text Extraction: Extract text content from PDF files with high accuracy
PDF Analysis: Get comprehensive metadata (pages, document info, encryption status)
Page-Specific Processing: Extract text from specific pages
Multi-Language OCR: Support for Thai and English text recognition
Smart Processing: Automatically chooses between OCR and direct text extraction
Markdown Conversion: Convert PDF content to clean markdown format
Document Analysis: Determine if PDFs are scanned images or searchable text
Network Diagram Detection: Advanced capability to detect and extract network diagrams
MCP Integration: Seamless integration with Amazon Q Developer CLI

📦 Installation

Using uvx (Recommended)

The easiest way to use this MCP server is with uvx:

# Run directly without installation
uvx pdf-reader-mcp # your mcp client can now connect using both sse and stdio transport

# Or install globally
uvx install pdf-reader-mcp

Using pip

pip install pdf-reader-mcp

From Source

git clone https://github.com/zixma13/pdf-reader-mcp.git
cd pdf-reader-mcp
pip install -e .

📋 Prerequisites

Install Tesseract OCR (required for OCR functionality):

# For macOS
brew install tesseract
brew install tesseract-lang  # For language support including Thai

# For Ubuntu/Debian
# sudo apt-get install tesseract-ocr
# sudo apt-get install tesseract-ocr-tha  # For Thai language support

Install Python dependencies:
```
pip install -r requirements.txt
```
Ensure you have the virtual environment activated:
```
source .venv/bin/activate
```
Test the server:
```
mcp dev main.py
```

Configuration for Amazon Q Developer CLI

Add the following to your ~/.aws/amazonq/mcp.json file:

in case you clone the git repository

{
  "mcpServers": {
    "pdf_reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/pdf_reader/pdf_reader",
        "run",
        "main.py"
      ]
    }
  }
}

in case you use uvx

{
  "mcpServers": {
    "pdf_reader": {
      "command": "uvx",
      "timeout": 60000,
      "args": [
        "pdf-reader-mcp"
      ]
    }
  }
}

Usage Examples

Once configured, you can use the PDF reader tools in Amazon Q:

Basic PDF Processing

To analyze a PDF and determine if it's a scanned image or searchable text:
```
pdf_reader___analyze_pdf("/path/to/document.pdf")
```
To intelligently extract content from a PDF (automatically choosing between OCR and text extraction):
```
pdf_reader___smart_extract_pdf("/path/to/document.pdf")
```

To intelligently convert a PDF to markdown:

pdf_reader___smart_pdf_to_markdown("/path/to/document.pdf")

To extract text from a PDF:

pdf_reader___read_pdf("/path/to/document.pdf")

To get metadata from a PDF:

pdf_reader___get_pdf_metadata("/path/to/document.pdf")

To extract text from a specific page (0-indexed):

pdf_reader___extract_pdf_page("/path/to/document.pdf", 0)

To extract text using OCR (supports Thai and English):
```
pdf_reader___ocr_pdf("/path/to/document.pdf")
```

To extract text from a specific page using OCR:

pdf_reader___ocr_pdf_page("/path/to/document.pdf", 0)

To convert PDF to markdown format:

pdf_reader___pdf_to_markdown("/path/to/document.pdf")

To convert PDF to markdown format using OCR:

pdf_reader___pdf_to_markdown("/path/to/document.pdf", use_ocr=True)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 20, 2025

0.1.0

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_reader_mcp-0.1.1.tar.gz (7.6 kB view details)

Uploaded Jun 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf_reader_mcp-0.1.1-py3-none-any.whl (7.8 kB view details)

Uploaded Jun 20, 2025 Python 3

File details

Details for the file pdf_reader_mcp-0.1.1.tar.gz.

File metadata

Download URL: pdf_reader_mcp-0.1.1.tar.gz
Upload date: Jun 20, 2025
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for pdf_reader_mcp-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e9ca1086f458c16c2015f3050d54d1a32f6f34fcbbf89992e6f69d57d6c850e9`
MD5	`4d33d7d93b4fb98d67400e03a417b68f`
BLAKE2b-256	`6e5400e5405992ed526971512bcd7babf2e7b268cc71c24b047efdd41a3b2c6e`

See more details on using hashes here.

File details

Details for the file pdf_reader_mcp-0.1.1-py3-none-any.whl.

File metadata

Download URL: pdf_reader_mcp-0.1.1-py3-none-any.whl
Upload date: Jun 20, 2025
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for pdf_reader_mcp-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d43e6ab4218ac3608e2671c2faa3968671319ef2e3f4d9cdf958a3d782333f2`
MD5	`6e106c8ed53e463bf1907fea881ede2c`
BLAKE2b-256	`a2e3a0fb6e24f1cd21c9c5e5682313def33ba6f507c1d9158901fbc08c043414`

See more details on using hashes here.

pdf-reader-mcp 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PDF Reader MCP Server

🚀 Features

📦 Installation

Using uvx (Recommended)

Using pip

From Source

📋 Prerequisites

Configuration for Amazon Q Developer CLI

Usage Examples

Basic PDF Processing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes