MCP server for format-preserving PDF text editing — find, replace, and batch-edit text in existing PDFs while preserving fonts, layout, and visual fidelity.
Project description
pdf-edit-mcp
MCP server for editing text in existing PDFs through content-stream surgery. Targets fidelity preservation (original font, exact position, in-place operators) and reports — honestly — when fidelity has to break.
v0.2.0 is a native Python (FastMCP) server. Earlier 0.1.x releases were a TypeScript MCP server that shelled out to a Python
bridge.py; v0.2.0 imports the engine in-process — one runtime, no Node.js, distributed on PyPI. See Migrating from 0.1.x.
How it works
Most PDF editors use a redact-and-replace approach — they white out the original text and stamp new text on top, usually with a substitute font. The result looks different from the original.
pdf-edit-mcp takes a different approach. It modifies the original PDF content stream operators directly, preserving the exact font, size, color, and position of the text being edited — when the embedded font already contains the glyphs you need.
| Traditional approach | pdf-edit-mcp | |
|---|---|---|
| Method | Redact old text, stamp new text | Modify content stream operators in place |
| Font | Substituted (often Helvetica) | Original font when possible; metric-equivalent fallback (e.g. Carlito for Calibri) when not |
| Position | Re-calculated | Exact original coordinates |
| Quality feedback | None | FidelityReport on every edit (font_preserved, font_substituted, glyphs_missing, overflow_detected, typed degradations) |
Powered by pdf-edit-engine — a Python library for PDF content stream surgery with in-place font subset extension.
When fidelity is exact, and when it isn't
This matters more than the headline claim. Every edit's fidelity report tells you which tier fired:
- Tier 1 — exact (
font_preserved=true,font_substituted=null): the embedded font already had every glyph the replacement needs. Output is byte-identical at the operator layer. - Tier 1.5 — in-place injection (
font_preserved=true): the glyph wasn't embedded but was in your system font with matchingunitsPerEm. Original CIDs are preserved; only new glyphs are appended. Visually indistinguishable from Tier 1. Covers TrueType (glyf) and, as of engine v0.2.0, CID-keyed (Type0) CFF / Type1C fonts. - Metric-equivalent fallback (
font_preserved=false,font_substituted="Carlito-Regular"or similar): the original font isn't installed, so an open-source font with matching metrics is used for the new glyphs. Very close, spacing correct, not pixel-perfect.
What still refuses honestly (a typed font_extension_failed / clear error rather than silent corruption):
- CFF shapes the injector doesn't cover — simple-font (non-CID) CFF, CFF2, name-keyed CFF, multi-FD CID, composite donors.
- Type 3 (procedural) fonts.
unitsPerEmmismatch between embedded and system font (rescaling out of scope).- A replacement wider than the bbox with no room to reflow (
overflow_detected=true+ a warning). - Multi-codepoint emoji / scripts your system fonts don't carry (
glyphs_missing).
Run pdf_analyze_subset first if you need to know the tier up front.
Features
- 38 tools across 7 categories (reading, text editing, block ops, section ops, annotations, document manipulation, metadata & security) + 3 built-in MCP prompts that guide the editing workflow.
- Edit encrypted PDFs — pass
password=to the read/edit tools to work on a password-protected PDF; the output is re-encrypted with the same password (engine A2.3). - Shrink-to-fit —
fit="shrink"onpdf_replace_block/pdf_batch_replace_blockshrinks the font to fit a fixed-height region (engine E.8). - Fidelity reporting on every edit:
font_preserved,font_substituted,overflow_detected,reflow_applied,glyphs_missing, awarningslist, and a typeddegradationsarray (30 engine degradation kinds, each{kind, detail, severity}) so callers can gate on quality. dry_runpreview onpdf_replace_text/pdf_replace_single/pdf_batch_replace— get the fidelity report without writing the output.- Per-page filtering on
pdf_find_text/pdf_get_text/pdf_get_fonts. - Batch operations — up to 500 find-and-replace edits per call, up to 50 block replacements per page, with output auto-verification on
pdf_batch_replace. - Section intelligence — detects structure by font hierarchy, swaps sections by fuzzy title match and refuses ambiguous matches rather than silently picking the first.
- Atomic write —
pdf_swap_sectionswrites to a temp file and renames only on full success; a failure leaves your output path untouched. - Engine-version gate at startup — refuses to serve against
pdf-edit-engine < 0.2.0, so missing fidelity fields can't masquerade asnull. - Path-safety boundary — every path is validated (absolute,
.pdf, no..traversal, no control chars, no Windows reserved/truncated basenames) before reaching the engine. - Runs entirely local — no external APIs, no network calls, no API keys.
Quick Start
Prerequisites
- Python 3.10+ (3.12 recommended).
- That's it —
pdf-edit-engineinstalls automatically as a dependency. (uvxfetches everything on first run; no manual install.)
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"pdf-edit": {
"command": "uvx",
"args": ["pdf-edit-mcp"]
}
}
}
Claude Code
claude mcp add pdf-edit -- uvx pdf-edit-mcp
Other MCP clients (Cursor, Windsurf, etc.)
Run via uvx pdf-edit-mcp, or install it and use the console script:
pip install pdf-edit-mcp
pdf-edit-mcp # or: python -m pdf_edit_mcp
Tools
Reading & Analysis
| Tool | Description |
|---|---|
pdf_inspect |
Complete document overview — text, fonts, paragraphs, annotations in one call. Start here before editing. |
pdf_get_text |
Extract all text from a PDF |
pdf_find_text |
Find all occurrences of a string with page numbers and bounding box positions |
pdf_get_fonts |
List fonts with encoding type, glyph count, PostScript name, subset status |
pdf_get_text_layout |
Get every text block with exact position, font, and size |
pdf_extract_bbox_text |
Extract text from a bounding box region with gap-aware joining |
pdf_detect_paragraphs |
Detect paragraph boundaries with bounding boxes on a page |
pdf_detect_sections |
Analyze document structure — section tree with titles, bounding boxes, and text |
pdf_analyze_subset |
Check if an embedded font can render specific characters before editing |
Text Editing
| Tool | Description |
|---|---|
pdf_replace_text |
Replace all occurrences of a string (names, dates, typos, labels) |
pdf_replace_single |
Replace one specific occurrence by match index |
pdf_batch_replace |
Multiple find-and-replace edits in one atomic operation (up to 500 edits) |
Block Operations
| Tool | Description |
|---|---|
pdf_replace_block |
Replace all content within a bounding box with new text (fit="shrink" to shrink-to-fit) |
pdf_batch_replace_block |
Replace content in multiple bounding boxes atomically |
pdf_insert_text_block |
Insert text at a position |
pdf_delete_block |
Delete content in a bounding box, optionally close the gap |
Section Operations
| Tool | Description |
|---|---|
pdf_swap_sections |
Swap two sections by fuzzy title match — re-renders all siblings for uniform spacing |
pdf_replace_section |
Replace a section's entire content by fuzzy title match |
Annotations & Links
| Tool | Description |
|---|---|
pdf_get_annotations |
List all annotations with positions, types, and URLs |
pdf_add_annotation |
Add a link annotation at a position on a page |
pdf_update_annotation |
Update a link annotation's target URL |
pdf_delete_annotation_v2 |
Delete an annotation by page and index |
pdf_move_annotation |
Move an annotation to a new position |
pdf_add_hyperlink |
Add a clickable hyperlink to a page region |
pdf_add_highlight |
Add a highlight annotation with QuadPoints |
pdf_flatten_annotations |
Flatten all annotations into page content (non-editable) |
Document Manipulation
| Tool | Description |
|---|---|
pdf_merge |
Merge multiple PDFs into one document |
pdf_split |
Split a PDF into individual page files |
pdf_reorder_pages |
Reorder pages by 0-indexed page number array |
pdf_rotate_pages |
Rotate pages by 90, 180, or 270 degrees |
pdf_delete_pages |
Delete specific pages (0-indexed) |
pdf_crop_pages |
Crop all pages to a bounding box |
pdf_add_watermark |
Overlay a watermark PDF on all pages |
Metadata & Security
| Tool | Description |
|---|---|
pdf_edit_metadata |
Edit title, author, subject, creator, producer |
pdf_add_bookmark |
Add a navigation bookmark pointing to a page |
pdf_encrypt |
Encrypt with owner and user passwords |
pdf_decrypt |
Decrypt a password-protected PDF |
pdf_fill_form |
Fill form fields by name-value pairs |
Workflows
Three built-in MCP prompts guide the editing process: comprehensive-pdf-edit (structural changes — inspect → understand structure → pre-check → execute → verify), section-swap (swap two sections, re-rendering all siblings for uniform spacing), and quick-pdf-edit (simple typo/date/name changes with a fidelity check).
Architecture
AI Agent (Claude, GPT, etc.)
↓ MCP protocol (stdio)
pdf_edit_mcp — Python FastMCP server (this package)
↓ in-process import
pdf-edit-engine — Python library (pikepdf + fonttools + pdfminer)
- Single process: the engine is imported directly — no subprocess, no JSON-RPC bridge, no Node.js.
- Inputs are validated by Pydantic models (path safety, bounds, strict object shapes) before reaching the engine.
- Engine calls are serialized under a lock (the engine is not thread-safe) and
PDFEditErrors are translated to clean tool errors with recovery hints. stdoutis the MCP transport — all diagnostics go tostderr.
Layout: server.py (entry + version gate), app.py (FastMCP instance + lock), validation.py, serialize.py, _runtime.py, and tools_*.py / prompts.py (the tool + prompt surface).
Limitations
- Cross-page reflow — text expanding past a page boundary is not redistributed (
overflow_detected=true+ a warning). - Some CFF shapes — CID-keyed (Type0) CFF/Type1C is supported; simple-font CFF, CFF2, name-keyed CFF, multi-FD CID, and composite donors refuse honestly (
font_extension_failed). unitsPerEmmismatch between embedded and system font — out of scope; refuses rather than distort.- Image editing / table semantics — text-only.
- Right-to-left / complex-script shaping — bidi reordering is not handled; CJK line-breaking is supported (engine E.7).
- Multi-codepoint emoji not in your system fonts — recorded as
glyphs_missing.
Errors
Engine failures surface as MCP tool errors (isError) carrying a classified message and a recovery hint — for example:
OperatorError→ "TextMatch is stale — re-run pdf_find_text and retry."EncodingError→ "…run pdf_analyze_subset to check coverage."ReflowError→ "Replacement may be too wide — try shorter text or a different bbox."FontNotFoundError→ "Run pdf_get_fonts, or install the required font / accept a fallback."
Raw pikepdf exceptions (e.g. on an encrypted PDF opened without a password) are never leaked — you get a clean "password-protected" message instead.
Migrating from 0.1.x (npm)
The 0.1.x npm package @aryanbv/pdf-edit-mcp is deprecated. Replace the npm/npx launch config with the uvx config above. The tool names, inputs, and outputs are unchanged, so prompts and integrations keep working; you no longer need Node.js, and the PDF_EDIT_PYTHON env var is gone (the engine runs in-process).
Development
git clone https://github.com/AryanBV/pdf-edit-mcp.git
cd pdf-edit-mcp
python -m venv .venv && . .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
ruff check src/ tests/ # lint
mypy src/pdf_edit_mcp # type-check (strict)
pytest tests/ -q # tests (fixtures auto-generated via reportlab)
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_edit_mcp-0.2.0.tar.gz.
File metadata
- Download URL: pdf_edit_mcp-0.2.0.tar.gz
- Upload date:
- Size: 34.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a27466e5fcdda4d1d75a1a7dd21d0fdba07559a91c9dbe448b4676abbdbbb78
|
|
| MD5 |
69ed2962391d018ca647f6c6cb9cb81d
|
|
| BLAKE2b-256 |
4c007b742c7209d21e5edde21421caab33283a04615ca14f18c086a2c6a0c337
|
File details
Details for the file pdf_edit_mcp-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pdf_edit_mcp-0.2.0-py3-none-any.whl
- Upload date:
- Size: 33.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b68b6db906405046adbf3d746c66ec2ae9c7c508a07ed8b6fb970d08de5a40e5
|
|
| MD5 |
9ee5b6810021b70521c374be46286212
|
|
| BLAKE2b-256 |
3d174337fb1bcd897058abbfe8c29440ef17223d37e495b54e8d6ccb9a6273af
|