MCP-compatible PDF reading server with intelligent file search and extraction

These details have not been verified by PyPI

Project links

Project description

MokuPDF - MCP-Compatible PDF Reading Server

MokuPDF is a lightweight, MCP (Model Context Protocol) compatible server that enables LLMs to read and process PDF files with full text and image extraction capabilities. It provides a clean JSON-RPC interface for PDF operations, making it easy to integrate PDF reading capabilities into AI applications.

🚀 Features

Full PDF Text Extraction: Extract all text content from PDF files
Image Extraction: Extract and encode embedded images as base64 PNG
Scanned PDF Support: Automatically detects and renders image-based/scanned PDFs
Smart File Search: Find PDFs by partial names or keywords across common directories
Optional OCR: Extract text from scanned pages with pytesseract (optional dependency)
Page-by-Page Reading: Efficiently read large PDFs without memory issues
Text Search: Search for text within PDFs with regex support
MCP Compatible: Fully compatible with the Model Context Protocol
CLI Support: Command-line interface with configurable options
Lightweight: Minimal dependencies, fast startup

📦 Installation

From Source

# Clone the repository
git clone https://github.com/yourusername/mokupdf.git
cd mokupdf

# Install the package
pip install .

# Or install in development mode
pip install -e .

Using pip (when published)

# Basic installation
pip install mokupdf

# With OCR support for scanned PDFs
pip install mokupdf[ocr]

Note: For OCR functionality, you'll also need Tesseract installed on your system:

Windows: Download from GitHub releases
Mac: brew install tesseract
Linux: sudo apt-get install tesseract-ocr

🎯 Quick Start

Running the Server

# Start with default settings (port 8000)
mokupdf

# Start with custom port
mokupdf --port 8080

# Enable verbose logging
mokupdf --verbose

# Set custom PDF directory
mokupdf --base-dir ./documents

Command Line Options

Option	Description	Default
`--port`	Port to listen on	8000
`--verbose`	Enable verbose logging	False
`--base-dir`	Base directory for PDF files	Current directory
`--max-file-size`	Maximum PDF file size in MB	100
`--version`	Show version information	-
`--help`	Show help message	-

🔧 MCP Configuration

Add MokuPDF to your MCP configuration file:

{
  "mcpServers": {
    "mokupdf": {
      "command": "python",
      "args": ["-m", "mokupdf", "--port", "8000"],
      "name": "MokuPDF",
      "description": "PDF reading server with text and image extraction",
      "env": {
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

📚 Available MCP Tools

1. open_pdf

Open a PDF file for processing.

{
  "tool": "open_pdf",
  "arguments": {
    "file_path": "document.pdf"
  }
}

2. read_pdf

Read PDF pages with text and images. Supports page ranges for efficient processing.

{
  "tool": "read_pdf",
  "arguments": {
    "file_path": "document.pdf",
    "start_page": 1,
    "end_page": 5,
    "max_pages": 10
  }
}

Response includes:

Text content with [IMAGE: ...] placeholders
Base64-encoded images
Page information

3. search_text

Search for text within the current PDF.

{
  "tool": "search_text",
  "arguments": {
    "query": "introduction",
    "case_sensitive": false
  }
}

4. get_page_text

Extract text from a specific page.

{
  "tool": "get_page_text",
  "arguments": {
    "page_number": 1
  }
}

5. get_metadata

Get metadata from the current PDF.

{
  "tool": "get_metadata",
  "arguments": {}
}

6. close_pdf

Close the current PDF and free memory.

{
  "tool": "close_pdf",
  "arguments": {}
}

💻 Development

Project Structure

mokupdf/
├── mokupdf/
│   ├── __init__.py       # Package initialization
│   ├── server.py         # Main server implementation
│   └── __main__.py       # Module entry point
├── setup.py              # Package setup script
├── pyproject.toml        # Modern Python packaging
├── requirements.txt      # Direct dependencies
├── LICENSE              # MIT License
└── README.md           # This file

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=mokupdf

Code Quality

# Format code
black mokupdf/

# Lint code
flake8 mokupdf/

🔍 Example Usage

Python Script Example

import json
import subprocess

# Start MokuPDF server
process = subprocess.Popen(
    ["mokupdf", "--port", "8000"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    text=True
)

# Send a request to open a PDF
request = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "open_pdf",
        "arguments": {"file_path": "example.pdf"}
    },
    "id": 1
}

# Send request
process.stdin.write(json.dumps(request) + "\n")
process.stdin.flush()

# Read response
response = json.loads(process.stdout.readline())
print(f"PDF opened: {response['result']}")

Integration with LLMs

MokuPDF is designed to work seamlessly with LLM applications through MCP. The read_pdf tool returns content in a format optimized for LLM consumption:

Text is extracted with page markers
Images are embedded as base64 PNG with placeholders in text
Large PDFs can be read page-by-page to avoid context limits

🛠️ Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'mokupdf'

Solution: Install the package with pip install .

Issue: Port already in use

Solution: Use a different port with --port 8081

Issue: PDF file not found

Solution: Check the base directory and ensure paths are relative to it

Issue: Large PDF causes timeout

Solution: Use page-by-page reading with start_page and end_page parameters

Debug Mode

Enable verbose logging for detailed information:

mokupdf --verbose

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📞 Support

For issues, questions, or suggestions:

Open an issue on GitHub
Check the Installation Instructions for detailed setup help
Enable verbose mode (--verbose) for debugging

🙏 Acknowledgments

Built with PyMuPDF for PDF processing
Designed for Model Context Protocol compatibility
Inspired by the need for better PDF integration in AI applications

Made with ❤️ for the AI community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Sep 23, 2025

This version

1.0.0

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mokupdf-1.0.0.tar.gz (17.5 kB view details)

Uploaded Sep 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mokupdf-1.0.0-py3-none-any.whl (13.4 kB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file mokupdf-1.0.0.tar.gz.

File metadata

Download URL: mokupdf-1.0.0.tar.gz
Upload date: Sep 5, 2025
Size: 17.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mokupdf-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8ed45f23839d92049fe13c870f4d2d6b585ae02d0580afe2e516e96e7690ff0d`
MD5	`97ea7aeb8368f2fe1a3c416290f93e90`
BLAKE2b-256	`b1167d23c0fcebff158eaf52649eef9fa96d1cd6665c5f4d572ecb5b93552eb8`

See more details on using hashes here.

File details

Details for the file mokupdf-1.0.0-py3-none-any.whl.

File metadata

Download URL: mokupdf-1.0.0-py3-none-any.whl
Upload date: Sep 5, 2025
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mokupdf-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a12c6f528f9f91fbec56325a2355b9c19d2d2ea01fd1f7ac79794d1545a74a65`
MD5	`9ae0939373e163333144635963322767`
BLAKE2b-256	`01136f0d00c5696e176b9f1d9daa6b9dcd7baaca39aaeb3e877ee82e451673c0`

See more details on using hashes here.

mokupdf 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MokuPDF - MCP-Compatible PDF Reading Server

🚀 Features

📦 Installation

From Source

Using pip (when published)

🎯 Quick Start

Running the Server

Command Line Options

🔧 MCP Configuration

📚 Available MCP Tools

1. open_pdf

2. read_pdf

3. search_text

4. get_page_text

5. get_metadata

6. close_pdf

💻 Development

Project Structure

Running Tests

Code Quality

🔍 Example Usage

Python Script Example

Integration with LLMs

🛠️ Troubleshooting

Common Issues

Debug Mode

📄 License

🤝 Contributing

📞 Support

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes