Skip to main content

MCP server to extract contents from PDF files

Project description

PDF Extraction MCP Server (Claude Code Fork)

MCP server to extract contents from PDF files, with fixes for Claude Code CLI installation.

This fork includes critical fixes for installing and running the server with Claude Code (the CLI version).

What's Different in This Fork

  1. Added __main__.py - Enables the package to be run as a module with python -m pdf_extraction
  2. Claude Code specific instructions - Clear installation steps that work with Claude Code CLI
  3. Tested installation process - Verified working with claude mcp add command

Components

Tools

The server implements one tool:

  • extract-pdf-contents: Extract contents from a local PDF file
    • Takes pdf_path as a required string argument (local file path)
    • Takes pages as an optional string argument (comma-separated page numbers, supports negative indexing like -1 for last page)
    • Supports both PDF text extraction and OCR for scanned documents

Installation for Claude Code CLI

Prerequisites

  • Python 3.11 or higher
  • pip or conda
  • Claude Code CLI installed (claude command)

Step 1: Clone and Install

# Clone this fork
git clone https://github.com/lh/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server

# Install in development mode
pip install -e .

Step 2: Find the Installed Command

# Check where pdf-extraction was installed
which pdf-extraction
# Example output: /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction

Step 3: Add to Claude Code

# Add the server using the full path from above
claude mcp add pdf-extraction /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction

# Verify it was added
claude mcp list

Step 4: Use in Claude

# Start a new Claude session
claude

# In Claude, type:
/mcp

# You should see:
# MCP Server Status
# • pdf-extraction: connected

Usage Example

Once connected, you can ask Claude to extract PDF contents:

"Can you extract the content from the PDF at /path/to/document.pdf?"

"Extract pages 1-3 and the last page from /path/to/document.pdf"

Troubleshooting

Server Not Connecting

  1. Make sure you started a NEW Claude session after adding the server
  2. Verify the command path is correct: ls -la $(which pdf-extraction)
  3. Test the command directly (it should hang waiting for input): pdf-extraction

Module Not Found Errors

If you get Python import errors:

  1. Make sure you're using the same Python environment where you installed the package
  2. Try using the full Python path: claude mcp add pdf-extraction /path/to/python -m pdf_extraction

Installation Issues

If pip install -e . fails:

  1. Make sure you have Python 3.11+: python --version
  2. Try creating a fresh virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -e .
    

For Claude Desktop Users

This fork is specifically for Claude Code CLI. If you're using Claude Desktop (the GUI app), please refer to the original repository for installation instructions.

Dependencies

  • mcp>=1.2.0
  • pypdf2>=3.0.1
  • pytesseract>=0.3.10 (for OCR support)
  • Pillow>=10.0.0
  • pydantic>=2.10.1,<3.0.0
  • pymupdf>=1.24.0

Contributing

Contributions are welcome! The main change in this fork is the addition of __main__.py to make the package runnable as a module.

License

Same as the original repository.

Credits

Original server by @xraywu Claude Code fixes by @lh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_pdf_extraction-0.1.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iflow_mcp_pdf_extraction-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file iflow_mcp_pdf_extraction-0.1.0.tar.gz.

File metadata

File hashes

Hashes for iflow_mcp_pdf_extraction-0.1.0.tar.gz
Algorithm Hash digest
SHA256 35daa3c9b3893875e1d9713e5a4633fe05e9950eee61ed263d9753e7c756ff9c
MD5 4a721914d6979f58a7f2fa19fe427439
BLAKE2b-256 7ed641b91c7656bc969da63c34ad37e73c28c8dfbcfcf5fc85f7b832f1024b84

See more details on using hashes here.

File details

Details for the file iflow_mcp_pdf_extraction-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for iflow_mcp_pdf_extraction-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4ba6318597a5b226bed09c59d9e16a299e606d70dd90cdf2d2b6a8923bf98a9
MD5 d7c5afc6692876ec5b68d988b1b9a123
BLAKE2b-256 b07fbe7ff6af7bd35f6ca22d05c4672e02290c4bc8e6115f957c7dfc695a93c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page