Document processing MCP server (PDF read, merge, split, compress, report generation)
Project description
DocMCP — Document Processing MCP Server
Servidor MCP para manipulación de documentos PDF. Lee, fusiona, divide, comprime, extrae páginas e imágenes, y genera PDFs desde texto, tablas o informes estructurados.
Features / Funcionalidades
| Tool / Herramienta | Description / Descripción |
|---|---|
read |
Lee un PDF y devuelve texto, metadatos y número de páginas |
info |
Obtiene metadatos, tamaño, páginas y conteo de imágenes |
extract_images |
Extrae todas las imágenes de un PDF a un directorio |
to_markdown |
Convierte el contenido del PDF a formato Markdown |
merge |
Fusiona múltiples PDFs en uno solo |
split |
Divide un PDF en páginas individuales |
extract_pages |
Extrae páginas específicas (ej: 1,3,5-10) a un nuevo PDF |
compress |
Comprime un PDF reduciendo su tamaño |
generate_report |
Genera un PDF estructurado desde contenido JSON |
generate_table |
Crea un PDF con una tabla estilizada |
generate_text |
Convierte texto plano/Markdown a PDF |
Tech Stack
- Python —
>=3.11 - Framework:
mcp(FastMCP) via stdio JSON-RPC - PDF reading:
PyMuPDF(fitz) - PDF manipulation:
pypdf - PDF generation:
reportlab
Quick Start
# Instalar dependencias
pip install mcp pymupdf reportlab pypdf
# Ejecutar servidor (stdio transport)
python server.py
# Configurar directorio de trabajo (opcional)
export DOCMCP_WORKDIR=/home/user/documents
Uso MCP Client
from mcp import ClientSession, StdioServerParameters
async with ClientSession(server) as session:
result = await session.call_tool("read", {"path": "documento.pdf"})
result = await session.call_tool("merge", {"paths": "a.pdf,b.pdf", "output": "merged.pdf"})
result = await session.call_tool("generate_report", {
"title": "Reporte",
"content": '{"sección": ["item1", "item2"]}',
"output": "reporte.pdf"
})
Security / Seguridad
Path traversal protection via DOCMCP_WORKDIR. All file operations are restricted to the work directory, with trailing-slash prefix check to prevent /home/user/evil matching /home/user/extra.
🔧 Recent Improvements
- Path Traversal Hardened —
_resolve()now normalizes relative paths against workdir and uses trailing-slash prefix check to_markdown()Error Handling — Output path validation errors are now caught and returned gracefullygenerate_table()JSON Parsing — MalformedrowsJSON returns a friendly error instead of crashingmerge()Newline Support — Accepts newline-separated paths in addition to comma-separatedMAX_PAGESConfigurable —DOCMCP_MAX_PAGESenv var (default 100) controls PDF page limitimport fitzMoved — Orphaned import at end ofgenerator.pymoved to top of file
Project Structure
docmcp/
├── server.py # MCP server entry point (tools)
├── docmcp/
│ ├── reader.py # PDF reading & extraction
│ ├── manipulator.py # Merge, split, compress, extract pages
│ ├── generator.py # PDF generation (reports, tables, text)
│ └── __init__.py
├── client.py # Test client
└── pyproject.toml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docmcp-1.0.0.tar.gz.
File metadata
- Download URL: docmcp-1.0.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75d989dd2c543025dfca673b91bb8fbc153b48556856a756e12420e07771c5ae
|
|
| MD5 |
e7f8ef5e4d34661615b5b3d5e4172e14
|
|
| BLAKE2b-256 |
153ce418590259ce81c3681528caffcdc61960143adccf93bfb6748bc8d9d04a
|
File details
Details for the file docmcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: docmcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f52cc91880b0ad987a8aec66c8457384ddbb8cce6ef8a1fefa5c5606efbee427
|
|
| MD5 |
04d84770e193c258d753920f477b0a21
|
|
| BLAKE2b-256 |
42e64619e298edd8e6b3832730087d5ab95ff64cd5cdb81ccd18d9ab20609720
|