Skip to main content

Simple PDF manipulation and conversion for Python

Project description

lazypdf

Tests Python License

Simple PDF manipulation and conversion for Python. Read a PDF, transform it, export to another format. That's it.

No complex pipelines, no bloated abstractions — just a clean, fluent API to merge, split, compress, watermark, convert, and more.

Install

pip install lazypdf

Optional extras:

pip install lazypdf[ocr]       # OCR support (pytesseract + Pillow)
pip install lazypdf[office]    # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables]    # Table extraction (pdfplumber)
pip install lazypdf[html]      # HTML to PDF via WeasyPrint engine
pip install lazypdf[browser]   # HTML to PDF via Playwright engine (Chromium)
pip install lazypdf[repair]    # PDF repair via pikepdf engine
pip install lazypdf[msoffice]  # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all]       # Everything

Quick Start

import lazypdf as lz

# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")

# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")

# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")

# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")

# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")

# Add watermark and page numbers
(
    lz.read("report.pdf")
    .add_watermark("CONFIDENTIAL", opacity=0.2)
    .add_page_numbers(position="bottom-center")
    .to_pdf("final.pdf")
)

# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)

# Extract text
text = lz.read("document.pdf").extract_text()

# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")

# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")

# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)

# Chain anything
(
    lz.read("input.pdf")
    .merge("extra.pdf")
    .remove_pages([2, 4])
    .rotate(90, pages=[1])
    .crop(left=50, right=50)
    .add_watermark("DRAFT")
    .compress()
    .to_pdf("result.pdf")
)

API Reference

Entry Points

Function Description Dependency
lz.read(path) Read a PDF file pymupdf
lz.read_pdf(path) Alias for read() pymupdf
lz.merge(*paths) Merge multiple PDFs pymupdf
lz.read_images(*paths, page_size=) Create PDF from images (default: "fit") pymupdf
lz.read_jpg(*paths, page_size=) Create PDF from JPEGs pymupdf
lz.read_png(*paths, page_size=) Create PDF from PNGs pymupdf
lz.read_html(path_or_url, engine=) Create PDF from HTML (default: "pymupdf") pymupdf
lz.read_docx(path) Read Word document MS Office / LibreOffice
lz.read_xlsx(path) Read Excel spreadsheet MS Office / LibreOffice
lz.read_pptx(path) Read PowerPoint presentation MS Office / LibreOffice
lz.read_csv(path) Read CSV file MS Office / LibreOffice
lz.from_bytes(data) Create PDF from raw bytes pymupdf

Chainable Operations

Method Description
.merge(*others) Append more PDFs (paths, objects, or lists)
.rotate(degrees, pages=) Rotate pages (multiple of 90)
.crop(left=, top=, right=, bottom=, pages=) Crop page margins (in points)
.compress(img_quality=, compression_level=) Reduce file size (deflate compression, dedup objects)
.add_watermark(text, ...) Add text watermark
.add_image_watermark(path, ...) Add image watermark (with opacity)
.add_page_numbers(...) Insert page numbers
.resize(size, pages=) Resize pages to standard paper size (a4, letter, etc.)
.flatten(dpi=, pages=) Rasterize pages (burns annotations/forms into flat image)
.extract_pages(pages) Keep only specified pages
.remove_pages(pages) Remove specified pages
.reorder(order) Reorder/duplicate pages
.reverse() Reverse page order
.encrypt(password, algorithm=) Add password protection (default: AES-256-R5)
.decrypt(password) Remove password protection
.redact(text) Black out text permanently
.repair(engine=) Fix corrupted PDFs (default: "auto")
.ocr(language=) Make scanned pages searchable
.copy() Create independent copy

All page parameters are 1-indexed (first page = 1).

Export (Terminal Operations)

Method Returns
.to_pdf(path) str (output path)
.to_jpg(output_dir) list[str] (image paths)
.to_png(output_dir) list[str] (image paths)
.to_images(output_dir, fmt=) list[str] (image paths)
.to_docx(path) str (output path)
.to_xlsx(path) str (output path)
.to_pdfa(path, level=, engine=) str (output path, default: "pymupdf")
.to_bytes() bytes
.split(output_dir, every=) list[str] (PDF paths)
.split_at(output_dir, at=) list[str] (PDF paths)

Extraction & Info

Method / Property Returns
.extract_text(pages=, engine=, page_separator=) str
.extract_tables(pages=, flavor=) list[list[list[str]]]
.extract_images(output_dir, pages=) list[str] (image paths)
.metadata dict
.page_count int
.page_sizes() list[tuple[float, float]]

Limitations

  • Office reads (read_docx, read_xlsx, read_pptx, read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion.
  • to_docx() extracts text only. Images, tables, and complex formatting are not preserved.
  • to_xlsx() only exports tables found in the PDF. Requires [tables] and [office] extras.
  • OCR (ocr()) requires Tesseract to be installed on the system in addition to the [ocr] pip extra.
  • read_html() defaults to PyMuPDF Story engine (basic CSS). For better rendering, use engine="weasyprint" (requires GTK) or engine="playwright" (requires Chromium).
  • Redaction (redact()) is case-sensitive exact text match. Save the result with to_pdf() to persist.
  • PDF/A (to_pdfa()) defaults to PyMuPDF engine which may not pass strict validators. Use engine="ghostscript" for full compliance (requires Ghostscript binary).
  • Flatten (flatten()) rasterizes pages to images — text becomes non-searchable. Default DPI is 72; use higher values for better quality.
  • Image watermark (add_image_watermark()) requires Pillow (included in [ocr] extra).

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazypdf-0.2.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazypdf-0.2.0-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file lazypdf-0.2.0.tar.gz.

File metadata

  • Download URL: lazypdf-0.2.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 749997865d7ebc27fb67a812a9c97b59d364531df313f307d507350d5d177161
MD5 82865f2d864dc38a4d6e2d02019baca9
BLAKE2b-256 2580cfaaf85e6d69257ae059e7ebd236c2aefb592570a5fd6879511812d27785

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.2.0.tar.gz:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lazypdf-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lazypdf-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e8bc1e2f23fa47d348fb3cc5a2b470aaaa0e947316bde895a254fba94288d1b
MD5 265b46ae09a611640780d67d259f132e
BLAKE2b-256 0fe7e7a2783805c9064130a1fb7f13521a6c58ca10374d9e320f5fb902b60e9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.2.0-py3-none-any.whl:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page