Skip to main content

Comprehensive MCP server for PDF reading, navigation, and text search

Project description

PDF Navigator MCP

A comprehensive Model Context Protocol (MCP) server for PDF reading, navigation, and text search with cross-platform PDF viewer integration. Eliminates PyMuPDF dependency issues by providing PDF functionality through MCP.

Features

  • PDF text extraction - Read full PDFs or specific pages/ranges
  • PDF structure analysis - Extract table of contents and page summaries
  • Text search with location - Find text and jump to results
  • Direct PDF navigation - Open PDFs to specific pages
  • PDF form filling - Extract form fields to markdown, edit, and fill PDFs
  • Cross-platform PDF viewers - Supports Skim, Zathura, Evince, and more
  • MCP integration - Works with Claude Code and other MCP clients
  • No dependency issues - PyMuPDF isolated in MCP server environment

Installation

# Install with pipx (recommended)
pipx install git+https://github.com/matsengrp/pdf-navigator-mcp.git

# Or install in current environment
pip install git+https://github.com/matsengrp/pdf-navigator-mcp.git

Claude Code Integration

Add to your ~/.claude.json:

{
  "mcpServers": {
    "pdf-navigator": {
      "type": "stdio",
      "command": "pdf-navigator-mcp"
    }
  }
}

Usage

In Claude Code, you can:

  • "Read the abstract from paper.pdf" → Extracts and shows text content
  • "What's the table of contents for paper.pdf?" → Shows PDF structure
  • "Read pages 5-10 of paper.pdf" → Extracts specific page range
  • "Search for 'parameter efficiency' in paper.pdf" → Finds text and locations
  • "Open paper.pdf to page 5" → Opens PDF viewer to specific page
  • "Extract form fields from application.pdf" → Creates markdown file with form fields
  • "Fill the PDF form with my data" → Fills PDF using edited markdown data

MCP Tools

Reading Tools

  • read_pdf_text(file_path, start_page, end_page) - Extract text from page range
  • read_pdf_page(file_path, page_number) - Extract text from single page
  • get_pdf_structure(file_path) - Get table of contents and page summaries
  • get_pdf_info(file_path) - Get document metadata

Navigation Tools

  • search_pdf_text(file_path, query) - Search text and return locations
  • open_pdf_page(file_path, page_number) - Open PDF viewer to specific page
  • search_and_open(file_path, query, result_index) - Search and open to result

Form Filling Tools

  • extract_form_to_markdown(file_path, output_md_path) - Extract form fields to markdown with multi-line detection
  • fill_form_from_markdown(pdf_path, markdown_path, output_pdf_path, distribute_text=True, max_chars_per_field=50, respect_line_breaks=True) - Fill PDF from markdown with intelligent text distribution

PDF Form Filling Workflow

The PDF form filling feature uses a markdown-based workflow:

  1. Extract form fields - Analyze the PDF and create a markdown file with all detected fields
  2. Edit the markdown - Fill in values using any text editor
  3. Fill the PDF - Apply the markdown data back to create a filled PDF

Example Workflow

# Step 1: Extract form fields to markdown
# Creates a markdown file with placeholders for each field
extract_form_to_markdown("application.pdf", "application_form.md")

# Step 2: Edit application_form.md in your editor
# Fill in values after each arrow (→)

# Step 3: Fill the PDF with your data
fill_form_from_markdown("application.pdf", "application_form.md", "application_filled.pdf")

Markdown Format

The extracted markdown looks like:

# PDF Form: application.pdf
Type: Interactive Form
Generated: 2025-08-03

## Form Fields

### Page 1
- Full Name → John Smith
- Email → john@example.com
- Phone → 555-0123
- [ ] Subscribe to newsletter → true

Form Types Supported

  • Interactive Forms - PDFs with actual form fields (fillable PDFs)
  • Static Forms - PDFs with underlines/boxes (creates moveable text annotations)

Enhanced Multi-line Form Detection

The PDF Navigator now includes advanced multi-line form detection and intelligent text distribution:

Features

  • Multi-line Section Detection - Automatically detects when multiple consecutive blank lines follow a section header (e.g., "I love..." followed by several underscores)
  • Smart Text Distribution - Distributes long text across multiple related fields using natural break points
  • Natural Break Points - Respects sentences, commas, conjunctions, and explicit line breaks
  • Configurable Parameters - Control text distribution behavior

Text Distribution Strategies

  1. Sentence splitting - "I love reading. Playing games is fun." → separate fields
  2. Comma/semicolon splitting - "Reading books, playing games, going to parks" → separate fields
  3. Conjunction splitting - "Reading and playing and going" → separate fields
  4. Word boundary splitting - Intelligent length-based splitting while preserving whole words

Configuration Options

  • distribute_text: bool - Enable/disable multi-line text distribution (default: True)
  • max_chars_per_field: int - Target character limit per field (default: 50)
  • respect_line_breaks: bool - Honor newlines in input text (default: True)

Example

Instead of cramming "Reading books with my parents, doing puzzles and addition, going on trips, anything with my big sister" into one tiny field, it automatically distributes as:

  • Field 1: "Reading books with my parents"
  • Field 2: "doing puzzles and addition"
  • Field 3: "going on trips"
  • Field 4: "anything with my big sister"

Form Filling Best Practices

For optimal text distribution in multi-line fields:

- personal_interests_love_1 (I love...) → Reading books with my parents
Doing puzzles and addition
Going on trips
Anything with my big sister

The newlines enable intelligent distribution across multiple PDF fields, preventing cramped text. Use the extract_and_fill_form and format_multiline_form_data MCP prompts for guided workflows.

Supported PDF Readers

  • Skim (macOS) - skim:// URL scheme
  • Zathura (Linux) - --page argument
  • Evince (Linux) - --page-index argument
  • SumatraPDF (Windows) - -page argument
  • Adobe Acrobat (Cross-platform) - /A page=N argument

Configuration

Configure your PDF reader in ~/.pdf-navigator-config.json:

{
  "pdf_reader": "skim",
  "reader_path": "/Applications/Skim.app"
}

Development

git clone https://github.com/matsengrp/pdf-navigator-mcp.git
cd pdf-navigator-mcp
pip install -e ".[dev]"

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7c4d3a1ff70883fdc6edf9d90c798ddc56f299fcbf85f1e6ec47f113e629fc81
MD5 f29425e5406dd7e27cd8ec1cd98edb52
BLAKE2b-256 557e8ad8dcfcdd01c79d6dc47bd5ddcac3963ef79c62cba34ea42c862dc1a7eb

See more details on using hashes here.

File details

Details for the file iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_matsengrp_pdf_navigator_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 396f5a3644c026a2e067376eaa0a62fb28a416f9d7ac1fbd256bd3932a752166
MD5 4a9e104457fa9d1c279e97301bf3d069
BLAKE2b-256 9e36c2648b83c7d320e4ab12831e8ecd03dd81b651bb0aa2c9cc8db56a3d0461

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page