Skip to main content

Document processing MCP server (PDF read, merge, split, compress, report generation)

Project description

DocMCP — Document Processing MCP Server

CI PyPI Python License

Servidor MCP para manipulación de documentos PDF. Lee, fusiona, divide, comprime, extrae páginas e imágenes, y genera PDFs desde texto, tablas o informes estructurados.

Features / Funcionalidades

Tool / Herramienta Description / Descripción
read Lee un PDF y devuelve texto, metadatos y número de páginas
info Obtiene metadatos, tamaño, páginas y conteo de imágenes
extract_images Extrae todas las imágenes de un PDF a un directorio
to_markdown Convierte el contenido del PDF a formato Markdown
merge Fusiona múltiples PDFs en uno solo
split Divide un PDF en páginas individuales
extract_pages Extrae páginas específicas (ej: 1,3,5-10) a un nuevo PDF
compress Comprime un PDF reduciendo su tamaño
generate_report Genera un PDF estructurado desde contenido JSON
generate_table Crea un PDF con una tabla estilizada
generate_text Convierte texto plano/Markdown a PDF

Tech Stack

  • Python>=3.11
  • Framework: mcp (FastMCP) via stdio JSON-RPC
  • PDF reading: PyMuPDF (fitz)
  • PDF manipulation: pypdf
  • PDF generation: reportlab

Quick Start

# Instalar dependencias
pip install mcp pymupdf reportlab pypdf

# Ejecutar servidor (stdio transport)
python server.py

# Configurar directorio de trabajo (opcional)
export DOCMCP_WORKDIR=/home/user/documents

Uso MCP Client

from mcp import ClientSession, StdioServerParameters

async with ClientSession(server) as session:
    result = await session.call_tool("read", {"path": "documento.pdf"})
    result = await session.call_tool("merge", {"paths": "a.pdf,b.pdf", "output": "merged.pdf"})
    result = await session.call_tool("generate_report", {
        "title": "Reporte",
        "content": '{"sección": ["item1", "item2"]}',
        "output": "reporte.pdf"
    })

Security / Seguridad

Path traversal protection via DOCMCP_WORKDIR. All file operations are restricted to the work directory, with trailing-slash prefix check to prevent /home/user/evil matching /home/user/extra.

🔧 Recent Improvements

  • Path Traversal Hardened_resolve() now normalizes relative paths against workdir and uses trailing-slash prefix check
  • to_markdown() Error Handling — Output path validation errors are now caught and returned gracefully
  • generate_table() JSON Parsing — Malformed rows JSON returns a friendly error instead of crashing
  • merge() Newline Support — Accepts newline-separated paths in addition to comma-separated
  • MAX_PAGES ConfigurableDOCMCP_MAX_PAGES env var (default 100) controls PDF page limit
  • import fitz Moved — Orphaned import at end of generator.py moved to top of file

Project Structure

docmcp/
├── server.py              # MCP server entry point (tools)
├── docmcp/
│   ├── reader.py          # PDF reading & extraction
│   ├── manipulator.py     # Merge, split, compress, extract pages
│   ├── generator.py       # PDF generation (reports, tables, text)
│   └── __init__.py
├── client.py              # Test client
└── pyproject.toml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmcp-1.0.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmcp-1.0.0-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file docmcp-1.0.0.tar.gz.

File metadata

  • Download URL: docmcp-1.0.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for docmcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 75d989dd2c543025dfca673b91bb8fbc153b48556856a756e12420e07771c5ae
MD5 e7f8ef5e4d34661615b5b3d5e4172e14
BLAKE2b-256 153ce418590259ce81c3681528caffcdc61960143adccf93bfb6748bc8d9d04a

See more details on using hashes here.

File details

Details for the file docmcp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: docmcp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for docmcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f52cc91880b0ad987a8aec66c8457384ddbb8cce6ef8a1fefa5c5606efbee427
MD5 04d84770e193c258d753920f477b0a21
BLAKE2b-256 42e64619e298edd8e6b3832730087d5ab95ff64cd5cdb81ccd18d9ab20609720

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page