Simple PDF manipulation and conversion for Python

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmfeck

These details have not been verified by PyPI

Project description

lazypdf

Simple PDF manipulation and conversion for Python. Read a PDF, transform it, export to another format. That's it.

No complex pipelines, no bloated abstractions — just a clean, fluent API to merge, split, compress, watermark, convert, and more.

Install

pip install lazypdf

Optional extras:

pip install lazypdf[ocr]       # OCR support (pytesseract + Pillow)
pip install lazypdf[office]    # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables]    # Table extraction (pdfplumber)
pip install lazypdf[html]      # HTML to PDF via WeasyPrint engine
pip install lazypdf[browser]   # HTML to PDF via Playwright engine (Chromium)
pip install lazypdf[repair]    # PDF repair via pikepdf engine
pip install lazypdf[msoffice]  # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all]       # Everything

Quick Start

import lazypdf as lz

# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")

# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")

# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")

# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")

# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")

# Add watermark and page numbers
(
    lz.read("report.pdf")
    .add_watermark("CONFIDENTIAL", opacity=0.2)
    .add_page_numbers(position="bottom-center")
    .to_pdf("final.pdf")
)

# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)

# Extract text
text = lz.read("document.pdf").extract_text()

# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")

# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")

# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)

# Chain anything
(
    lz.read("input.pdf")
    .merge("extra.pdf")
    .remove_pages([2, 4])
    .rotate(90, pages=[1])
    .crop(left=50, right=50)
    .add_watermark("DRAFT")
    .compress()
    .to_pdf("result.pdf")
)

API Reference

Entry Points

Function	Description	Dependency
`lz.read(path)`	Read a PDF file	pymupdf
`lz.read_pdf(path)`	Alias for `read()`	pymupdf
`lz.merge(*paths)`	Merge multiple PDFs	pymupdf
`lz.read_images(*paths, page_size=)`	Create PDF from images (default: `"fit"`)	pymupdf
`lz.read_jpg(*paths, page_size=)`	Create PDF from JPEGs	pymupdf
`lz.read_png(*paths, page_size=)`	Create PDF from PNGs	pymupdf
`lz.read_html(path_or_url, engine=)`	Create PDF from HTML (default: `"pymupdf"`)	pymupdf
`lz.read_docx(path)`	Read Word document	MS Office / LibreOffice
`lz.read_xlsx(path)`	Read Excel spreadsheet	MS Office / LibreOffice
`lz.read_pptx(path)`	Read PowerPoint presentation	MS Office / LibreOffice
`lz.read_csv(path)`	Read CSV file	MS Office / LibreOffice
`lz.from_bytes(data)`	Create PDF from raw bytes	pymupdf

Chainable Operations

Method	Description
`.merge(*others)`	Append more PDFs (paths, objects, or lists)
`.rotate(degrees, pages=)`	Rotate pages (multiple of 90)
`.crop(left=, top=, right=, bottom=, pages=)`	Crop page margins (in points)
`.compress(img_quality=, compression_level=)`	Reduce file size (deflate compression, dedup objects)
`.add_watermark(text, ...)`	Add text watermark
`.add_image_watermark(path, ...)`	Add image watermark (with opacity)
`.add_page_numbers(...)`	Insert page numbers
`.resize(size, pages=)`	Resize pages to standard paper size (a4, letter, etc.)
`.flatten(dpi=, pages=)`	Rasterize pages (burns annotations/forms into flat image)
`.extract_pages(pages)`	Keep only specified pages
`.remove_pages(pages)`	Remove specified pages
`.reorder(order)`	Reorder/duplicate pages
`.reverse()`	Reverse page order
`.encrypt(password, algorithm=)`	Add password protection (default: AES-256-R5)
`.decrypt(password)`	Remove password protection
`.redact(text)`	Black out text permanently
`.repair(engine=)`	Fix corrupted PDFs (default: `"auto"`)
`.ocr(language=)`	Make scanned pages searchable
`.copy()`	Create independent copy

All page parameters are 1-indexed (first page = 1).

Export (Terminal Operations)

Method	Returns
`.to_pdf(path)`	`str` (output path)
`.to_jpg(output_dir)`	`list[str]` (image paths)
`.to_png(output_dir)`	`list[str]` (image paths)
`.to_images(output_dir, fmt=)`	`list[str]` (image paths)
`.to_docx(path)`	`str` (output path)
`.to_xlsx(path)`	`str` (output path)
`.to_pdfa(path, level=, engine=)`	`str` (output path, default: `"pymupdf"`)
`.to_bytes()`	`bytes`
`.split(output_dir, every=)`	`list[str]` (PDF paths)
`.split_at(output_dir, at=)`	`list[str]` (PDF paths)

Extraction & Info

Method / Property	Returns
`.extract_text(pages=, engine=, page_separator=)`	`str`
`.extract_tables(pages=, flavor=)`	`list[list[list[str]]]`
`.extract_images(output_dir, pages=)`	`list[str]` (image paths)
`.metadata`	`dict`
`.page_count`	`int`
`.page_sizes()`	`list[tuple[float, float]]`

Limitations

Office reads (read_docx, read_xlsx, read_pptx, read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion.
to_docx() extracts text only. Images, tables, and complex formatting are not preserved.
to_xlsx() only exports tables found in the PDF. Requires [tables] and [office] extras.
OCR (ocr()) requires Tesseract to be installed on the system in addition to the [ocr] pip extra.
read_html() defaults to PyMuPDF Story engine (basic CSS). For better rendering, use engine="weasyprint" (requires GTK) or engine="playwright" (requires Chromium).
Redaction (redact()) is case-sensitive exact text match. Save the result with to_pdf() to persist.
PDF/A (to_pdfa()) defaults to PyMuPDF engine which may not pass strict validators. Use engine="ghostscript" for full compliance (requires Ghostscript binary).
Flatten (flatten()) rasterizes pages to images — text becomes non-searchable. Default DPI is 72; use higher values for better quality.
Image watermark (add_image_watermark()) requires Pillow (included in [ocr] extra).

License

BSD-3-Clause

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmfeck

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Mar 31, 2026

0.1.1

Mar 29, 2026

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazypdf-0.2.0.tar.gz (31.2 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazypdf-0.2.0-py3-none-any.whl (23.2 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file lazypdf-0.2.0.tar.gz.

File metadata

Download URL: lazypdf-0.2.0.tar.gz
Upload date: Mar 31, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`749997865d7ebc27fb67a812a9c97b59d364531df313f307d507350d5d177161`
MD5	`82865f2d864dc38a4d6e2d02019baca9`
BLAKE2b-256	`2580cfaaf85e6d69257ae059e7ebd236c2aefb592570a5fd6879511812d27785`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.2.0.tar.gz:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazypdf-0.2.0.tar.gz
- Subject digest: 749997865d7ebc27fb67a812a9c97b59d364531df313f307d507350d5d177161
- Sigstore transparency entry: 1203592082
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: jmfeck/lazypdf@1266d372cea7f27e1fda1ab120e59a0d5b8ae5ea
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jmfeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1266d372cea7f27e1fda1ab120e59a0d5b8ae5ea
- Trigger Event: release

File details

Details for the file lazypdf-0.2.0-py3-none-any.whl.

File metadata

Download URL: lazypdf-0.2.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazypdf-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e8bc1e2f23fa47d348fb3cc5a2b470aaaa0e947316bde895a254fba94288d1b`
MD5	`265b46ae09a611640780d67d259f132e`
BLAKE2b-256	`0fe7e7a2783805c9064130a1fb7f13521a6c58ca10374d9e320f5fb902b60e9c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazypdf-0.2.0-py3-none-any.whl:

Publisher: publish.yml on jmfeck/lazypdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazypdf-0.2.0-py3-none-any.whl
- Subject digest: 7e8bc1e2f23fa47d348fb3cc5a2b470aaaa0e947316bde895a254fba94288d1b
- Sigstore transparency entry: 1203592083
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: jmfeck/lazypdf@1266d372cea7f27e1fda1ab120e59a0d5b8ae5ea
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jmfeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1266d372cea7f27e1fda1ab120e59a0d5b8ae5ea
- Trigger Event: release

lazypdf 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

lazypdf

Install

Quick Start

API Reference

Entry Points

Chainable Operations

Export (Terminal Operations)

Extraction & Info

Limitations

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance