A pandas-like wrapper for PDF operations. Read, transform, export.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmfeck

These details have not been verified by PyPI

Project description

lazypdf

A pandas-like Python wrapper for PDF operations. Read, transform, export.

Install

pip install lazypdf

Optional extras:

pip install lazypdf[ocr]       # OCR support (pytesseract + Pillow)
pip install lazypdf[office]    # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables]    # Table extraction (pdfplumber)
pip install lazypdf[html]      # HTML to PDF (WeasyPrint)
pip install lazypdf[msoffice]  # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all]       # Everything

Quick Start

import lazypdf as lz

# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")

# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")

# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")

# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")

# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")

# Add watermark and page numbers
(
    lz.read("report.pdf")
    .add_watermark("CONFIDENTIAL", opacity=0.2)
    .add_page_numbers(position="bottom-center")
    .to_pdf("final.pdf")
)

# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)

# Extract text
text = lz.read("document.pdf").extract_text()

# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")

# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")

# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)

# Chain anything
(
    lz.read("input.pdf")
    .merge("extra.pdf")
    .remove_pages([2, 4])
    .rotate(90, pages=[1])
    .crop(left=50, right=50)
    .add_watermark("DRAFT")
    .compress()
    .to_pdf("result.pdf")
)

API Reference

Entry Points

Function	Description	Dependency
`lz.read(path)`	Read a PDF file	pymupdf
`lz.read_pdf(path)`	Alias for `read()`	pymupdf
`lz.merge(*paths)`	Merge multiple PDFs	pymupdf
`lz.read_images(*paths)`	Create PDF from images	pymupdf
`lz.read_jpg(*paths)`	Create PDF from JPEGs	pymupdf
`lz.read_png(*paths)`	Create PDF from PNGs	pymupdf
`lz.read_html(path_or_url)`	Create PDF from HTML	weasyprint
`lz.read_docx(path)`	Read Word document	MS Office / LibreOffice
`lz.read_xlsx(path)`	Read Excel spreadsheet	MS Office / LibreOffice
`lz.read_pptx(path)`	Read PowerPoint presentation	MS Office / LibreOffice
`lz.read_csv(path)`	Read CSV file	MS Office / LibreOffice
`lz.from_bytes(data)`	Create PDF from raw bytes	pymupdf

Chainable Operations

Method	Description
`.merge(*others)`	Append more PDFs (paths, objects, or lists)
`.rotate(degrees, pages=)`	Rotate pages (multiple of 90)
`.crop(left=, top=, right=, bottom=, pages=)`	Crop page margins (in points)
`.compress()`	Reduce file size (deflate compression, dedup objects)
`.add_watermark(text, ...)`	Add text watermark
`.add_image_watermark(path, ...)`	Add image watermark (with opacity)
`.add_page_numbers(...)`	Insert page numbers
`.resize(size, pages=)`	Resize pages to standard paper size (a4, letter, etc.)
`.flatten(dpi=, pages=)`	Rasterize pages (burns annotations/forms into flat image)
`.extract_pages(pages)`	Keep only specified pages
`.remove_pages(pages)`	Remove specified pages
`.reorder(order)`	Reorder/duplicate pages
`.reverse()`	Reverse page order
`.encrypt(password)`	Add password protection (AES-256)
`.decrypt(password)`	Remove password protection
`.redact(text)`	Black out text permanently
`.repair()`	Fix corrupted PDFs
`.ocr(language=)`	Make scanned pages searchable
`.copy()`	Create independent copy

All page parameters are 1-indexed (first page = 1).

Export (Terminal Operations)

Method	Returns
`.to_pdf(path)`	`str` (output path)
`.to_jpg(output_dir)`	`list[str]` (image paths)
`.to_png(output_dir)`	`list[str]` (image paths)
`.to_images(output_dir, fmt=)`	`list[str]` (image paths)
`.to_docx(path)`	`str` (output path)
`.to_xlsx(path)`	`str` (output path)
`.to_pdfa(path, level=)`	`str` (output path, requires Ghostscript)
`.to_bytes()`	`bytes`
`.split(output_dir, every=)`	`list[str]` (PDF paths)
`.split_at(output_dir, at=)`	`list[str]` (PDF paths)

Extraction & Info

Method / Property	Returns
`.extract_text(pages=)`	`str`
`.extract_tables(pages=)`	`list[list[list[str]]]`
`.extract_images(output_dir, pages=)`	`list[str]` (image paths)
`.metadata`	`dict`
`.page_count`	`int`
`.page_sizes()`	`list[tuple[float, float]]`

Limitations

Office reads (read_docx, read_xlsx, read_pptx, read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion.
to_docx() extracts text only. Images, tables, and complex formatting are not preserved.
to_xlsx() only exports tables found in the PDF. Requires [tables] and [office] extras.
OCR (ocr()) requires Tesseract to be installed on the system in addition to the [ocr] pip extra.
read_html() requires WeasyPrint which has system-level dependencies (Pango, Cairo). See WeasyPrint docs.
Redaction (redact()) is case-sensitive exact text match. Save the result with to_pdf() to persist.
PDF/A (to_pdfa()) requires Ghostscript installed on the system (gs on Linux/Mac, gswin64c on Windows).
Flatten (flatten()) rasterizes pages to images — text becomes non-searchable. Use higher DPI for better quality.
Image watermark (add_image_watermark()) requires Pillow (included in [ocr] extra).

License

BSD-3-Clause

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmfeck

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Mar 31, 2026

0.1.1

Mar 29, 2026

This version

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazypdf-0.1.0.tar.gz (25.8 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazypdf-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file lazypdf-0.1.0.tar.gz.

File metadata

Download URL: lazypdf-0.1.0.tar.gz
Upload date: Mar 29, 2026
Size: 25.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`be680679f4e339db0941d47403422dd6a881ae8f2f04e19f3c546d936fe339e7`
MD5	`71be93fd74571faa8a5fb488f7469a44`
BLAKE2b-256	`8d1eddc1c6189f4bdb5a507caf1118ff993e5df8ec93a7a36771d2a87cc4eef5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.1.0.tar.gz:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazypdf-0.1.0.tar.gz
- Subject digest: be680679f4e339db0941d47403422dd6a881ae8f2f04e19f3c546d936fe339e7
- Sigstore transparency entry: 1194556537
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: jmfeck/lazypdf@f16f92607a342c39231e9bf3b2b47736446bea36
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jmfeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f16f92607a342c39231e9bf3b2b47736446bea36
- Trigger Event: release

File details

Details for the file lazypdf-0.1.0-py3-none-any.whl.

File metadata

Download URL: lazypdf-0.1.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5e44e696da6333dc538130cf1b244abe0394e6304524a92a68384f1f3299379`
MD5	`cf16e0a2284de4e8b152eac9805206e7`
BLAKE2b-256	`d634a7620bf4a84a44ffe5847e45fc1067f2a9ba1c3cb47c7ec577a773e70f37`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.1.0-py3-none-any.whl:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazypdf-0.1.0-py3-none-any.whl
- Subject digest: d5e44e696da6333dc538130cf1b244abe0394e6304524a92a68384f1f3299379
- Sigstore transparency entry: 1194556553
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: jmfeck/lazypdf@f16f92607a342c39231e9bf3b2b47736446bea36
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jmfeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f16f92607a342c39231e9bf3b2b47736446bea36
- Trigger Event: release

lazypdf 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

lazypdf

Install

Quick Start

API Reference

Entry Points

Chainable Operations

Export (Terminal Operations)

Extraction & Info

Limitations

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance