A pandas-like wrapper for PDF operations. Read, transform, export.
Project description
lazypdf
A pandas-like Python wrapper for PDF operations. Read, transform, export.
Install
pip install lazypdf
Optional extras:
pip install lazypdf[ocr] # OCR support (pytesseract + Pillow)
pip install lazypdf[office] # DOCX/XLSX/PPTX export (python-docx, openpyxl, python-pptx)
pip install lazypdf[tables] # Table extraction (pdfplumber)
pip install lazypdf[html] # HTML to PDF (WeasyPrint)
pip install lazypdf[msoffice] # MS Office COM automation on Windows (pywin32)
pip install lazypdf[all] # Everything
Quick Start
import lazypdf as lz
# Read -> Transform -> Export
lz.read("input.pdf").rotate(90).compress().to_pdf("output.pdf")
# Merge multiple PDFs
lz.merge("file1.pdf", "file2.pdf", "file3.pdf").to_pdf("merged.pdf")
# Convert images to PDF
lz.read_images("scan1.jpg", "scan2.jpg").to_pdf("scans.pdf")
# Read Office documents (requires MS Office or LibreOffice)
lz.read_docx("report.docx").add_watermark("DRAFT").to_pdf("draft.pdf")
lz.read_xlsx("data.xlsx").to_png("output/")
lz.read_pptx("slides.pptx").extract_pages([1, 3]).to_pdf("summary.pdf")
# Extract specific pages
lz.read("big.pdf").extract_pages([1, 3, 5]).to_pdf("selected.pdf")
# Add watermark and page numbers
(
lz.read("report.pdf")
.add_watermark("CONFIDENTIAL", opacity=0.2)
.add_page_numbers(position="bottom-center")
.to_pdf("final.pdf")
)
# Export to images
lz.read("slides.pdf").to_png("output_dir/", dpi=300)
# Extract text
text = lz.read("document.pdf").extract_text()
# Encrypt / decrypt
lz.read("doc.pdf").encrypt("password").to_pdf("protected.pdf")
lz.read("protected.pdf").decrypt("password").to_pdf("unlocked.pdf")
# Redact sensitive text (case-sensitive, exact match)
lz.read("doc.pdf").redact("SECRET-123").to_pdf("redacted.pdf")
# Split into individual pages
lz.read("doc.pdf").split("output_dir/", every=1)
# Chain anything
(
lz.read("input.pdf")
.merge("extra.pdf")
.remove_pages([2, 4])
.rotate(90, pages=[1])
.crop(left=50, right=50)
.add_watermark("DRAFT")
.compress()
.to_pdf("result.pdf")
)
API Reference
Entry Points
| Function | Description | Dependency |
|---|---|---|
lz.read(path) |
Read a PDF file | pymupdf |
lz.read_pdf(path) |
Alias for read() |
pymupdf |
lz.merge(*paths) |
Merge multiple PDFs | pymupdf |
lz.read_images(*paths) |
Create PDF from images | pymupdf |
lz.read_jpg(*paths) |
Create PDF from JPEGs | pymupdf |
lz.read_png(*paths) |
Create PDF from PNGs | pymupdf |
lz.read_html(path_or_url) |
Create PDF from HTML | weasyprint |
lz.read_docx(path) |
Read Word document | MS Office / LibreOffice |
lz.read_xlsx(path) |
Read Excel spreadsheet | MS Office / LibreOffice |
lz.read_pptx(path) |
Read PowerPoint presentation | MS Office / LibreOffice |
lz.read_csv(path) |
Read CSV file | MS Office / LibreOffice |
lz.from_bytes(data) |
Create PDF from raw bytes | pymupdf |
Chainable Operations
| Method | Description |
|---|---|
.merge(*others) |
Append more PDFs (paths, objects, or lists) |
.rotate(degrees, pages=) |
Rotate pages (multiple of 90) |
.crop(left=, top=, right=, bottom=, pages=) |
Crop page margins (in points) |
.compress() |
Reduce file size (deflate compression, dedup objects) |
.add_watermark(text, ...) |
Add text watermark |
.add_image_watermark(path, ...) |
Add image watermark (with opacity) |
.add_page_numbers(...) |
Insert page numbers |
.resize(size, pages=) |
Resize pages to standard paper size (a4, letter, etc.) |
.flatten(dpi=, pages=) |
Rasterize pages (burns annotations/forms into flat image) |
.extract_pages(pages) |
Keep only specified pages |
.remove_pages(pages) |
Remove specified pages |
.reorder(order) |
Reorder/duplicate pages |
.reverse() |
Reverse page order |
.encrypt(password) |
Add password protection (AES-256) |
.decrypt(password) |
Remove password protection |
.redact(text) |
Black out text permanently |
.repair() |
Fix corrupted PDFs |
.ocr(language=) |
Make scanned pages searchable |
.copy() |
Create independent copy |
All page parameters are 1-indexed (first page = 1).
Export (Terminal Operations)
| Method | Returns |
|---|---|
.to_pdf(path) |
str (output path) |
.to_jpg(output_dir) |
list[str] (image paths) |
.to_png(output_dir) |
list[str] (image paths) |
.to_images(output_dir, fmt=) |
list[str] (image paths) |
.to_docx(path) |
str (output path) |
.to_xlsx(path) |
str (output path) |
.to_pdfa(path, level=) |
str (output path, requires Ghostscript) |
.to_bytes() |
bytes |
.split(output_dir, every=) |
list[str] (PDF paths) |
.split_at(output_dir, at=) |
list[str] (PDF paths) |
Extraction & Info
| Method / Property | Returns |
|---|---|
.extract_text(pages=) |
str |
.extract_tables(pages=) |
list[list[list[str]]] |
.extract_images(output_dir, pages=) |
list[str] (image paths) |
.metadata |
dict |
.page_count |
int |
.page_sizes() |
list[tuple[float, float]] |
Limitations
- Office reads (
read_docx,read_xlsx,read_pptx,read_csv) require either Microsoft Office (Windows, auto-detected) or LibreOffice (any OS, must be on PATH). No pure-Python solution exists for reliable Office-to-PDF conversion. to_docx()extracts text only. Images, tables, and complex formatting are not preserved.to_xlsx()only exports tables found in the PDF. Requires[tables]and[office]extras.- OCR (
ocr()) requires Tesseract to be installed on the system in addition to the[ocr]pip extra. read_html()requires WeasyPrint which has system-level dependencies (Pango, Cairo). See WeasyPrint docs.- Redaction (
redact()) is case-sensitive exact text match. Save the result withto_pdf()to persist. - PDF/A (
to_pdfa()) requires Ghostscript installed on the system (gson Linux/Mac,gswin64con Windows). - Flatten (
flatten()) rasterizes pages to images — text becomes non-searchable. Use higher DPI for better quality. - Image watermark (
add_image_watermark()) requires Pillow (included in[ocr]extra).
License
BSD-3-Clause
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazypdf-0.1.0.tar.gz.
File metadata
- Download URL: lazypdf-0.1.0.tar.gz
- Upload date:
- Size: 25.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be680679f4e339db0941d47403422dd6a881ae8f2f04e19f3c546d936fe339e7
|
|
| MD5 |
71be93fd74571faa8a5fb488f7469a44
|
|
| BLAKE2b-256 |
8d1eddc1c6189f4bdb5a507caf1118ff993e5df8ec93a7a36771d2a87cc4eef5
|
Provenance
The following attestation bundles were made for lazypdf-0.1.0.tar.gz:
Publisher:
publish.yml on jmfeck/lazypdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lazypdf-0.1.0.tar.gz -
Subject digest:
be680679f4e339db0941d47403422dd6a881ae8f2f04e19f3c546d936fe339e7 - Sigstore transparency entry: 1194556537
- Sigstore integration time:
-
Permalink:
jmfeck/lazypdf@f16f92607a342c39231e9bf3b2b47736446bea36 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jmfeck
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f16f92607a342c39231e9bf3b2b47736446bea36 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lazypdf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lazypdf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5e44e696da6333dc538130cf1b244abe0394e6304524a92a68384f1f3299379
|
|
| MD5 |
cf16e0a2284de4e8b152eac9805206e7
|
|
| BLAKE2b-256 |
d634a7620bf4a84a44ffe5847e45fc1067f2a9ba1c3cb47c7ec577a773e70f37
|
Provenance
The following attestation bundles were made for lazypdf-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on jmfeck/lazypdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lazypdf-0.1.0-py3-none-any.whl -
Subject digest:
d5e44e696da6333dc538130cf1b244abe0394e6304524a92a68384f1f3299379 - Sigstore transparency entry: 1194556553
- Sigstore integration time:
-
Permalink:
jmfeck/lazypdf@f16f92607a342c39231e9bf3b2b47736446bea36 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jmfeck
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f16f92607a342c39231e9bf3b2b47736446bea36 -
Trigger Event:
release
-
Statement type: