Skip to main content

Shrink PDFs to email-safe sizes while preserving visual quality.

Project description

PDF Email Optimizer

PDF Email Optimizer

CI PyPI Python License: MIT Agent Skill

Profiles Backends Optimizes

Shrink PDFs to email-safe sizes while preserving visual quality.

PDF Email Optimizer is built for posters, brochures, reports, photo-heavy decks, and design-tool exports that need to fit under a target like 5-7 MB. It starts with structural cleanup, recompresses images only when needed, and reports when a requested size conflicts with visual quality.

Install

From a checkout:

python -m pip install -e ".[dev]"
pdf-email-optimizer --help

Once published to a package index:

pipx install pdf-email-optimizer
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --profile quality

Also supported:

uvx pdf-email-optimizer input.pdf output.pdf --target 7mb
python -m pdf_email_optimizer input.pdf output.pdf --target-mb 7

Quick Start

# Ordinary email optimization
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7

# Preserve photos, screenshots, maps, and other detail
pdf-email-optimizer input.pdf output_email.pdf --target 7mb --quality

# Land inside a 5-7 MB range when possible
pdf-email-optimizer input.pdf output_email.pdf --range 5-7mb --quality

# Produce a Markdown report beside the output
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7 --report report.md

# Inspect without writing an optimized PDF
pdf-email-optimizer input.pdf --audit

The source PDF is never overwritten. Existing output files are rejected unless --force is supplied.

Profiles

Profile Use When Behavior
quality Photos, screenshots, maps, product images, "do not degrade" requests High JPEG floor, protects small images, runs render QA, does not use Ghostscript by default
balanced General email delivery Moderate recompression ladder and conservative structural cleanup
aggressive Smallest file matters more than perfect fidelity Lower quality floor, smaller long-edge caps, optional Ghostscript fallback

If quality mode cannot hit the requested size, the tool keeps the smallest quality-preserving output and emits a direct warning with next steps.

Output

Use --json for machine-readable summaries:

pdf-email-optimizer input.pdf output.pdf --target-mb 7 --json

The JSON contract is documented in docs/json-output.md and validated by schema/output-summary.schema.json. Important fields include input/output size, target status, strategy, page count, private payload removals, image statistics, render QA, quality status, and warnings.

Gallery

Original page (left) vs. email copy (right). All inputs are synthetic, CC0 fixtures generated by benchmarks/make_fixtures.py; regenerate the images with python benchmarks/make_gallery.py.

InDesign-style export — 2.35 MB → 0.18 MB (92% smaller, PSNR 57.8 dB)

InDesign-style export before and after

Scanned document — 0.73 MB → 0.25 MB (66% smaller)

Scanned document before and after

Repeated images — 0.81 MB → 0.14 MB (83% smaller, lossless dedupe)

Repeated images before and after

Benchmarks

The benchmark harness runs against the bundled redistributable fixtures:

python benchmarks/make_fixtures.py        # (re)generate CC0 sample PDFs
python benchmarks/run_benchmarks.py --manifest benchmarks/benchmark_manifest.yaml --output benchmarks/results/latest.json

It writes JSON plus a Markdown table. Missing fixtures are marked as skipped so published results stay honest. The table below is generated output (benchmarks/results/latest.md); PSNR/RMS compare the optimized copy against the original render, and inf/0.0 denote a pixel-identical (lossless) result.

Case Input Target Profile Output Reduction Target Hit Worst PSNR Worst RMS Strategy
photo_brochure 1.10 MB 0.6 MB quality 1.10 MB 0.1% No inf 0.0 pikepdf-structural
indesign_export 2.35 MB 1 MB balanced 0.18 MB 92.3% Yes 57.822 0.327679 image-recompress
illustrator_export 0.01 MB 7 MB balanced 0.01 MB 18.6% Yes inf 0.0 structural-cleanup
private_payload_export 0.16 MB 7 MB quality 0.16 MB 0.1% Yes inf 0.0 structural-cleanup
screenshot_report 0.27 MB 0.2 MB quality 0.09 MB 66.4% Yes inf 0.0 structural-cleanup
text_vector_document 0.00 MB 7 MB balanced 0.00 MB 12.2% Yes inf 0.0 structural-cleanup
scanned_pdf 0.73 MB 0.4 MB balanced 0.25 MB 66.6% Yes inf 0.0 structural-cleanup
mixed_transparency 1.75 MB 1 MB quality 1.75 MB -0.0% No inf 0.0 structural-cleanup
embedded_metadata 0.12 MB 7 MB balanced 0.12 MB 0.1% Yes inf 0.0 structural-cleanup
repeated_images 0.81 MB 0.5 MB balanced 0.14 MB 83.2% Yes inf 0.0 structural-cleanup
forms_annotations 0.01 MB 7 MB quality 0.01 MB 3.9% Yes inf 0.0 structural-cleanup
encrypted_pdf - 7.0 MB balanced failed - - - - Encrypted PDFs must be unlocked before optimization.

The quality profile deliberately refuses to degrade photo_brochure and mixed_transparency below their targets, emitting a warning instead of shipping a blurry file.

See docs/benchmarking.md before adding fixtures.

Visual QA

Render and compare two PDFs:

pdf-email-render-compare original.pdf optimized.pdf --output-dir qa-renders

This reports page-level pixel differences and can write original, optimized, and amplified diff PNGs for review.

Agent Usage

The repo includes SKILL.md for agent runtimes that load local skills. The short version:

  • Use quality when the user asks to preserve image fidelity.
  • Use balanced for ordinary email optimization.
  • Use aggressive only when visible quality loss is acceptable.
  • Report size, target status, strategy, and warnings.
  • Never overwrite the source PDF.

More examples are in docs/agent-usage.md.

Development

python -m pip install -e ".[dev]"
pytest
pytest --cov
ruff check .
python -m build

CI runs linting, tests, coverage, package build, and CLI smoke checks on Python 3.9-3.13.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_email_optimizer-0.1.0.tar.gz (570.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_email_optimizer-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file pdf_email_optimizer-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_email_optimizer-0.1.0.tar.gz
  • Upload date:
  • Size: 570.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf_email_optimizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 687e59d1548a0652c88438b17e6814b178d33632b92cb344c30436539695a326
MD5 5b2c31c0fb3defcecefb24c273fb764c
BLAKE2b-256 9e28339a8899f3b48386fd4b082ecc6be9a713707beffbc2ee26251581914481

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_email_optimizer-0.1.0.tar.gz:

Publisher: publish.yaml on petehottelet/pdf-email-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdf_email_optimizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_email_optimizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 decf86a929e3f72c193a8d6099348acde8a7c9df7ae20a01aa1dc7ffddbcaf8a
MD5 67fdebb8e3d8ed1f8149d7563727a82e
BLAKE2b-256 9039e1ebc82af343a14b14d91a723a73b4b9b5e7c7837fba03c653968cfa2a1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_email_optimizer-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on petehottelet/pdf-email-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page