Skip to main content

Python PDF toolkit — merge, split, rotate, compress, extract text, encrypt, decrypt, and 14 more operations. Powered by pypdf.

Project description

peasy-pdf

PyPI Python License: MIT pypdf

Python PDF toolkit with 21 operations for everyday document tasks. Merge multiple PDFs into one, split documents by page ranges, compress file size, rotate pages, extract text, encrypt with passwords, reorder, crop, resize, flatten forms, and manage PDF metadata -- all through a clean, consistent API. Every function accepts bytes, Path, or str and returns bytes, making it easy to chain operations or integrate into web services.

Built for PeasyPDF, a free online PDF toolkit with 25 browser-based tools for merging, splitting, compressing, converting, and securing PDF documents. The site processes files entirely client-side for privacy, while the Python package brings the same capabilities to scripts, pipelines, and AI assistants.

Try the interactive tools at peasypdf.com -- Merge PDF, Split PDF, Compress PDF, Rotate PDF, Encrypt PDF, and Extract Text.

peasy-pdf demo — merge, split, compress PDFs in Python

Table of Contents

Install

pip install peasy-pdf                # Core engine (pypdf)
pip install "peasy-pdf[cli]"         # + Command-line interface (typer, rich)
pip install "peasy-pdf[mcp]"         # + MCP server for AI assistants
pip install "peasy-pdf[api]"         # + HTTP client for peasypdf.com API
pip install "peasy-pdf[all]"         # Everything

Or run instantly without installing:

uvx --from "peasy-pdf[cli]" peasy-pdf info document.pdf

Quick Start

from peasy_pdf import merge, split, rotate, compress, info, extract_text

# Merge two PDF reports into a single document
merged = merge("report_q1.pdf", "report_q2.pdf")

# Split a PDF into chunks of 5 pages each
chunks = split("handbook.pdf", every=5)

# Rotate all pages 90 degrees clockwise
rotated = rotate("landscape.pdf", angle=90)

# Compress a PDF to reduce file size for email
compressed = compress("large-scan.pdf")

# Get PDF info — page count, title, encryption status
pdf_info = info("document.pdf")
print(f"Pages: {pdf_info.pages}, Title: {pdf_info.title}")

# Extract text from specific pages for indexing
text = extract_text("contract.pdf", pages="1-3")
print(text.full_text)

What You Can Do

Page Manipulation

PDFs are structured as sequences of independent page objects, which makes page-level operations straightforward -- you can rearrange, duplicate, or remove pages without touching the content streams. peasy-pdf provides 10 page manipulation functions that cover the most common document assembly tasks, from combining multiple files into one to extracting specific pages for review.

Function Description Key Parameters
merge() Combine multiple PDFs into a single document *sources (2+ PDF inputs)
split() Split by page ranges or every N pages ranges, every
rotate() Rotate pages by 90, 180, or 270 degrees angle, pages
reorder() Rearrange pages in any sequence order (e.g. "3,1,2")
reverse() Reverse the entire page order --
delete_pages() Remove specific pages from a document pages
extract_pages() Extract specific pages into a new PDF pages
odd_even() Filter odd or even pages (duplex printing) mode ("odd" or "even")
duplicate_pages() Duplicate pages for handouts or forms pages, copies
insert_blank() Insert blank pages at specific positions after, count, width, height
from peasy_pdf import merge, split, reorder, reverse, extract_pages, odd_even

# Merge a cover page with a report body
combined = merge("cover.pdf", "body.pdf")

# Split a 100-page book into 10-page chapters
chapters = split("book.pdf", every=10)

# Split by explicit ranges — pages 1-5 and pages 6-10 as separate files
parts = split("book.pdf", ranges="1-5,6-10")

# Reorder pages — put page 3 first, then 1, then 2
reordered = reorder("slides.pdf", order="3,1,2")

# Reverse a document for back-to-front printing
reversed_doc = reverse("handout.pdf")

# Extract only the executive summary (pages 2-4)
summary = extract_pages("annual_report.pdf", pages="2-4")

# Get odd pages for single-sided duplex printing
front_sides = odd_even("booklet.pdf", mode="odd")

Learn more: Merge PDF Tool · Split PDF Tool · Rotate PDF Tool

Document Operations

Beyond page-level assembly, PDF documents often need structural transformations. Compression reduces file size by re-encoding content streams with Flate (zlib) compression -- particularly effective on PDFs generated by scanners or design tools that leave streams uncompressed. Resizing scales page content to standard paper sizes (A3, A4, A5, Letter, Legal, Tabloid) while preserving aspect ratio. Cropping trims margins by adjusting the MediaBox coordinates, measured in PDF points (72 points per inch). Flattening bakes interactive form fields (AcroForm) into the page content, producing a static document that renders identically everywhere.

Function Description Key Parameters
compress() Compress content streams with Flate encoding --
resize() Scale pages to standard sizes (A4, Letter, etc.) size, pages
crop() Crop page margins by trimming edges left, bottom, right, top, pages
flatten() Flatten form fields into static content --

Supported page sizes:

Size Dimensions (points) Common Use
a3 841.89 x 1190.55 Posters, large-format printing
a4 595.28 x 841.89 International standard (210 x 297 mm)
a5 419.53 x 595.28 Booklets, notebooks
letter 612 x 792 US standard (8.5 x 11 in)
legal 612 x 1008 US legal documents (8.5 x 14 in)
tabloid 792 x 1224 US tabloid / ledger (11 x 17 in)
from peasy_pdf import compress, resize, crop, flatten

# Compress a scanned PDF — Flate-encodes uncompressed content streams
compressed = compress("scanned_invoice.pdf")

# Resize a Letter-size document to A4 for international distribution
resized = resize("us_report.pdf", size="a4")

# Resize only the first page to Letter
resized_first = resize("mixed.pdf", size="letter", pages="1")

# Crop 36 points (0.5 inch) from each edge to remove scan borders
cropped = crop("scan.pdf", left=36, right=36, top=36, bottom=36)

# Flatten a filled PDF form so fields become static text
flat = flatten("filled_form.pdf")

Learn more: Compress PDF Tool · Resize PDF Guide · PDF Glossary

Text & Metadata

Every PDF can carry two kinds of non-visual information: text content embedded in page streams, and document-level metadata. Text extraction reads the text operators from each page's content stream and assembles them into readable strings -- useful for full-text search indexing, content analysis, or feeding documents into LLM pipelines. The extract_text() function returns per-page results plus a combined full-text string.

PDF metadata follows the Info Dictionary standard defined in the PDF specification (ISO 32000). The six standard fields -- Title, Author, Subject, Keywords, Creator, and Producer -- appear in file properties dialogs and are indexed by search engines and document management systems. The XMP (Extensible Metadata Platform) standard extends this with richer schemas like Dublin Core, but the Info Dictionary remains the most widely used format. peasy-pdf provides get_metadata(), set_metadata(), and strip_metadata() for complete metadata lifecycle management.

Function Description Returns
extract_text() Extract text with per-page breakdown ExtractedText
info() Get page count, encryption status, metadata, file size PdfInfo
get_metadata() Read all 6 standard metadata fields PdfMetadata
set_metadata() Update specific metadata fields (preserves others) bytes
strip_metadata() Remove all metadata for privacy bytes
from peasy_pdf import extract_text, info, get_metadata, set_metadata, strip_metadata

# Extract text from a contract — per-page results for clause analysis
text = extract_text("contract.pdf", pages="1-5")
for page in text.pages:
    print(f"Page {page.page}: {len(page.text)} chars")
print(text.full_text[:200])  # First 200 characters of combined text

# Get document info — page count, title, encryption status, file size
pdf_info = info("annual_report.pdf")
print(f"Pages: {pdf_info.pages}")
print(f"Encrypted: {pdf_info.encrypted}")
print(f"Size: {pdf_info.size_bytes:,} bytes")
print(f"Producer: {pdf_info.producer}")

# Read PDF metadata fields (Title, Author, Subject, Keywords, Creator, Producer)
meta = get_metadata("report.pdf")
print(f"Title: {meta.title}, Author: {meta.author}")

# Update document title and author for proper cataloging
updated = set_metadata("draft.pdf", title="Q4 Financial Report", author="Finance Team")

# Strip all metadata before sharing externally — removes PII from file properties
clean = strip_metadata("internal_memo.pdf")

Learn more: Extract Text Tool · PDF Metadata Guide · What is PDF Metadata?

Security

PDF encryption uses the standard security handler defined in ISO 32000. The specification defines two passwords: the user password (required to open the document) and the owner password (grants full access including printing, copying, and editing). When you call encrypt(), pypdf applies 128-bit AES encryption by default. The decrypt() function removes encryption entirely, producing an unprotected PDF that anyone can open.

Permission flags in the PDF spec control what actions are allowed even after the document is opened: printing, content copying, form filling, annotation, and page extraction. While these flags are advisory (PDF viewers enforce them voluntarily), they are the standard mechanism for controlling document distribution in enterprise workflows.

Function Description Key Parameters
encrypt() Add password protection with AES encryption user_password, owner_password
decrypt() Remove password protection password
from peasy_pdf import encrypt, decrypt

# Encrypt a confidential report with a user password
protected = encrypt("financials.pdf", user_password="secret123")

# Encrypt with separate user and owner passwords
# User password to open, owner password for full access (print, copy, edit)
protected = encrypt(
    "board_minutes.pdf",
    user_password="view-only",
    owner_password="admin-access",
)

# Decrypt a password-protected PDF to remove restrictions
unlocked = decrypt("protected.pdf", password="secret123")

Learn more: Encrypt PDF Tool · Decrypt PDF Tool · PDF Security Guide

Page Specs

All page-aware functions use a 1-indexed page spec string. This syntax lets you target individual pages, ranges, or combinations without converting to zero-based indices yourself.

Spec Meaning Example
"1" Single page Page 1 only
"1,3,5" Multiple pages Pages 1, 3, and 5
"2-5" Page range Pages 2, 3, 4, and 5
"1,3-5,8" Mixed Pages 1, 3, 4, 5, and 8
"all" Every page All pages (default)
from peasy_pdf import rotate, extract_pages, delete_pages

# Rotate only page 1
rotated = rotate("doc.pdf", pages="1", angle=90)

# Extract pages 1, 3, and 5-7 into a new PDF
subset = extract_pages("doc.pdf", pages="1,3,5-7")

# Delete the last page (page 10 of a 10-page doc)
trimmed = delete_pages("doc.pdf", pages="10")

# All pages is the default for most functions
rotated_all = rotate("doc.pdf", angle=180)  # pages="all" implied

Input Flexibility

Every function accepts bytes, Path, or str (file path). This makes peasy-pdf work seamlessly with file systems, HTTP responses, databases, and in-memory buffers.

from pathlib import Path
from peasy_pdf import info

# String file path
result = info("document.pdf")

# pathlib.Path
result = info(Path("documents") / "report.pdf")

# Raw bytes from an HTTP response or database BLOB
pdf_bytes = response.content
result = info(pdf_bytes)

# Chain operations — output bytes feed directly into the next function
from peasy_pdf import compress, encrypt
compressed = compress("large.pdf")
protected = encrypt(compressed, user_password="secret")

All PDF-producing functions return bytes, so you can write results to disk, return them from a web endpoint, or pass them to another peasy-pdf function:

from peasy_pdf import merge, compress
from pathlib import Path

# Merge, compress, and save in one pipeline
result = compress(merge("part1.pdf", "part2.pdf"))
Path("final.pdf").write_bytes(result)

Command-Line Interface

pip install "peasy-pdf[cli]"

Every operation is available as a CLI subcommand:

# Merge multiple PDFs
peasy-pdf merge file1.pdf file2.pdf -o merged.pdf

# Split every 5 pages
peasy-pdf split doc.pdf --every 5 -o split_

# Split by ranges
peasy-pdf split doc.pdf --ranges "1-3,4-6" -o chapter_

# Rotate all pages 90 degrees
peasy-pdf rotate doc.pdf --angle 90 -o rotated.pdf

# Compress to reduce file size
peasy-pdf compress doc.pdf -o compressed.pdf

# Get document info
peasy-pdf info doc.pdf

# Extract text from specific pages
peasy-pdf text doc.pdf --pages 1-3

# Encrypt with a password
peasy-pdf encrypt doc.pdf --password secret -o encrypted.pdf

# Decrypt a protected PDF
peasy-pdf decrypt encrypted.pdf --password secret -o decrypted.pdf

# Update metadata
peasy-pdf metadata doc.pdf --title "New Title" --author "Author" -o updated.pdf

MCP Server (Claude, Cursor, Windsurf)

peasy-pdf includes a Model Context Protocol server that exposes all 21 PDF operations to AI assistants.

pip install "peasy-pdf[mcp]"

Claude Desktop (claude_desktop_config.json):

{
    "mcpServers": {
        "peasy-pdf": {
            "command": "uvx",
            "args": ["--from", "peasy-pdf[mcp]", "python", "-m", "peasy_pdf.mcp_server"]
        }
    }
}

Cursor (.cursor/mcp.json):

{
    "mcpServers": {
        "peasy-pdf": {
            "command": "uvx",
            "args": ["--from", "peasy-pdf[mcp]", "python", "-m", "peasy_pdf.mcp_server"]
        }
    }
}

Windsurf (~/.windsurf/mcp.json):

{
    "mcpServers": {
        "peasy-pdf": {
            "command": "uvx",
            "args": ["--from", "peasy-pdf[mcp]", "python", "-m", "peasy_pdf.mcp_server"]
        }
    }
}

REST API Client

The API client connects to the PeasyPDF developer API for server-side processing and tool discovery.

from peasy_pdf.api import PeasyPdfAPI

# Initialize the API client
api = PeasyPdfAPI()

# List all available PDF tools
tools = api.list_tools()
for tool in tools:
    print(f"{tool['name']}: {tool['description']}")

# Search the PDF glossary
results = api.search("compress")

# Get tool details
tool = api.get_tool("merge-pdf")

Full API documentation at peasypdf.com/developers/. OpenAPI 3.1.0 spec: peasypdf.com/api/openapi.json.

API Reference

Core Functions

Function Parameters Returns Description
merge(*sources, password) *sources: PdfInput bytes Merge 2+ PDFs into one document
split(source, ranges, every, password) source: PdfInput list[bytes] Split by ranges or every N pages
rotate(source, angle, pages, password) angle: int, pages: str bytes Rotate pages by 90/180/270 degrees
reorder(source, order, password) order: str bytes Reorder pages (e.g. "3,1,2")
reverse(source, password) -- bytes Reverse the page order
delete_pages(source, pages, password) pages: str bytes Remove specific pages
extract_pages(source, pages, password) pages: str bytes Extract specific pages
odd_even(source, mode, password) mode: "odd" | "even" bytes Filter odd or even pages
duplicate_pages(source, pages, copies, password) pages: str, copies: int bytes Duplicate pages N times
insert_blank(source, after, count, width, height, password) after: str, count: int bytes Insert blank pages at positions
compress(source, password) -- bytes Compress content streams
resize(source, size, pages, password) size: PageSize bytes Resize to A3/A4/A5/Letter/Legal/Tabloid
crop(source, left, bottom, right, top, pages, password) margins in points bytes Crop page margins
flatten(source, password) -- bytes Flatten form fields (AcroForm removal)
extract_text(source, pages, password) pages: str ExtractedText Extract text with per-page breakdown
info(source, password) -- PdfInfo Page count, metadata, encryption, size
get_metadata(source, password) -- PdfMetadata Read 6 standard metadata fields
set_metadata(source, title, author, ..., password) keyword args bytes Update metadata (preserves others)
strip_metadata(source, password) -- bytes Remove all metadata
encrypt(source, user_password, owner_password, password) passwords bytes Add AES password protection
decrypt(source, password) password: str bytes Remove password protection

Data Classes

Class Fields Description
PdfInfo pages, encrypted, title, author, subject, creator, producer, size_bytes Document information
PdfMetadata title, author, subject, keywords, creator, producer Metadata fields (all str)
ExtractedText pages: list[PageTextResult], full_text: str Extracted text with breakdown
PageTextResult page: int, text: str Text from a single page (1-indexed)

Type Aliases

Type Definition Description
PdfInput bytes | Path | str Any PDF source
PageSize Literal["a3", "a4", "a5", "letter", "legal", "tabloid"] Standard paper sizes
OddEvenMode Literal["odd", "even"] Page filter mode

Learn More About PDF

Also Available

Platform Install Link
npm npm install peasy-pdf npm
MCP uvx --from "peasy-pdf[mcp]" python -m peasy_pdf.mcp_server Config

Peasy Developer Tools

Part of the Peasy open-source developer tools ecosystem.

Package PyPI npm Description
peasy-pdf PyPI npm PDF merge, split, compress, 21 operations -- peasypdf.com
peasy-image PyPI npm Image resize, crop, convert, compress, 20 operations -- peasyimage.com
peasy-css PyPI npm CSS gradients, shadows, flexbox, grid generators -- peasycss.com
peasy-compress PyPI npm ZIP, TAR, gzip, brotli archive operations -- peasytools.com
peasy-document PyPI npm Markdown, HTML, CSV, JSON conversions -- peasytools.com
peasy-audio PyPI -- Audio convert, trim, merge, normalize -- peasyaudio.com
peasy-video PyPI -- Video trim, resize, GIF conversion -- peasyvideo.com
peasy-convert PyPI -- Unified CLI for all Peasy tools -- peasytools.com
peasy-mcp PyPI -- Unified MCP server for AI assistants -- peasytools.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peasy_pdf-0.1.1.tar.gz (298.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peasy_pdf-0.1.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file peasy_pdf-0.1.1.tar.gz.

File metadata

  • Download URL: peasy_pdf-0.1.1.tar.gz
  • Upload date:
  • Size: 298.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_pdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bfa3c30583d2a3e9a9111fa25f316d4efe77ea1f801daa142198e8f145dadee7
MD5 9c710ff23127b51319f92757ed91c127
BLAKE2b-256 2ee2903ba40e60bcec8b1386c75e9def9a60ce2c182575d554bcb78c7b51d646

See more details on using hashes here.

File details

Details for the file peasy_pdf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: peasy_pdf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_pdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f18973b01e4df95e0578aa883ce81b4541dcd969f4a134baf82b86ebe71fe75
MD5 bea7fdd27632b1da506838b663f99187
BLAKE2b-256 6308e73c2b047913def579e6aaa27e20edf69af7e3dd149005c41e22f9924fc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page