Skip to main content

A pandas-like wrapper for PDF operations. Read, transform, export.

Project description

lazypdf

Tests Python License

A pandas-like Python wrapper for PDF operations. Read, transform, export.

Install

pip install lazypdf

Optional extras:

pip install lazypdf[ocr]       # OCR support (pytesseract + Pillow)
pip install lazypdf[office]    # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables]    # Table extraction (pdfplumber)
pip install lazypdf[html]      # HTML to PDF (WeasyPrint)
pip install lazypdf[msoffice]  # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all]       # Everything

Quick Start

import lazypdf as lz

# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")

# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")

# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")

# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")

# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")

# Add watermark and page numbers
(
    lz.read("report.pdf")
    .add_watermark("CONFIDENTIAL", opacity=0.2)
    .add_page_numbers(position="bottom-center")
    .to_pdf("final.pdf")
)

# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)

# Extract text
text = lz.read("document.pdf").extract_text()

# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")

# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")

# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)

# Chain anything
(
    lz.read("input.pdf")
    .merge("extra.pdf")
    .remove_pages([2, 4])
    .rotate(90, pages=[1])
    .crop(left=50, right=50)
    .add_watermark("DRAFT")
    .compress()
    .to_pdf("result.pdf")
)

API Reference

Entry Points

Function Description Dependency
lz.read(path) Read a PDF file pymupdf
lz.read_pdf(path) Alias for read() pymupdf
lz.merge(*paths) Merge multiple PDFs pymupdf
lz.read_images(*paths) Create PDF from images pymupdf
lz.read_jpg(*paths) Create PDF from JPEGs pymupdf
lz.read_png(*paths) Create PDF from PNGs pymupdf
lz.read_html(path_or_url) Create PDF from HTML weasyprint
lz.read_docx(path) Read Word document MS Office / LibreOffice
lz.read_xlsx(path) Read Excel spreadsheet MS Office / LibreOffice
lz.read_pptx(path) Read PowerPoint presentation MS Office / LibreOffice
lz.read_csv(path) Read CSV file MS Office / LibreOffice
lz.from_bytes(data) Create PDF from raw bytes pymupdf

Chainable Operations

Method Description
.merge(*others) Append more PDFs (paths, objects, or lists)
.rotate(degrees, pages=) Rotate pages (multiple of 90)
.crop(left=, top=, right=, bottom=, pages=) Crop page margins (in points)
.compress() Reduce file size (deflate compression, dedup objects)
.add_watermark(text, ...) Add text watermark
.add_image_watermark(path, ...) Add image watermark (with opacity)
.add_page_numbers(...) Insert page numbers
.resize(size, pages=) Resize pages to standard paper size (a4, letter, etc.)
.flatten(dpi=, pages=) Rasterize pages (burns annotations/forms into flat image)
.extract_pages(pages) Keep only specified pages
.remove_pages(pages) Remove specified pages
.reorder(order) Reorder/duplicate pages
.reverse() Reverse page order
.encrypt(password) Add password protection (AES-256)
.decrypt(password) Remove password protection
.redact(text) Black out text permanently
.repair() Fix corrupted PDFs
.ocr(language=) Make scanned pages searchable
.copy() Create independent copy

All page parameters are 1-indexed (first page = 1).

Export (Terminal Operations)

Method Returns
.to_pdf(path) str (output path)
.to_jpg(output_dir) list[str] (image paths)
.to_png(output_dir) list[str] (image paths)
.to_images(output_dir, fmt=) list[str] (image paths)
.to_docx(path) str (output path)
.to_xlsx(path) str (output path)
.to_pdfa(path, level=) str (output path, requires Ghostscript)
.to_bytes() bytes
.split(output_dir, every=) list[str] (PDF paths)
.split_at(output_dir, at=) list[str] (PDF paths)

Extraction & Info

Method / Property Returns
.extract_text(pages=) str
.extract_tables(pages=) list[list[list[str]]]
.extract_images(output_dir, pages=) list[str] (image paths)
.metadata dict
.page_count int
.page_sizes() list[tuple[float, float]]

Limitations

  • Office reads (read_docx, read_xlsx, read_pptx, read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion.
  • to_docx() extracts text only. Images, tables, and complex formatting are not preserved.
  • to_xlsx() only exports tables found in the PDF. Requires [tables] and [office] extras.
  • OCR (ocr()) requires Tesseract to be installed on the system in addition to the [ocr] pip extra.
  • read_html() requires WeasyPrint which has system-level dependencies (Pango, Cairo). See WeasyPrint docs.
  • Redaction (redact()) is case-sensitive exact text match. Save the result with to_pdf() to persist.
  • PDF/A (to_pdfa()) requires Ghostscript installed on the system (gs on Linux/Mac, gswin64c on Windows).
  • Flatten (flatten()) rasterizes pages to images — text becomes non-searchable. Use higher DPI for better quality.
  • Image watermark (add_image_watermark()) requires Pillow (included in [ocr] extra).

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazypdf-0.1.0.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazypdf-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file lazypdf-0.1.0.tar.gz.

File metadata

  • Download URL: lazypdf-0.1.0.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 be680679f4e339db0941d47403422dd6a881ae8f2f04e19f3c546d936fe339e7
MD5 71be93fd74571faa8a5fb488f7469a44
BLAKE2b-256 8d1eddc1c6189f4bdb5a507caf1118ff993e5df8ec93a7a36771d2a87cc4eef5

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.1.0.tar.gz:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lazypdf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lazypdf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5e44e696da6333dc538130cf1b244abe0394e6304524a92a68384f1f3299379
MD5 cf16e0a2284de4e8b152eac9805206e7
BLAKE2b-256 d634a7620bf4a84a44ffe5847e45fc1067f2a9ba1c3cb47c7ec577a773e70f37

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.1.0-py3-none-any.whl:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page