Shrink PDFs to email-safe sizes while preserving visual quality.
Project description
PDF Email Optimizer
Shrink PDFs to email-safe sizes while preserving visual quality.
PDF Email Optimizer is built for posters, brochures, reports, photo-heavy decks, and design-tool exports that need to fit under a target like 5-7 MB. It starts with structural cleanup, recompresses images only when needed, and reports when a requested size conflicts with visual quality.
Install
From a checkout:
python -m pip install -e ".[dev]"
pdf-email-optimizer --help
Once published to a package index:
pipx install pdf-email-optimizer
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --profile quality
Also supported:
uvx pdf-email-optimizer input.pdf output.pdf --target 7mb
python -m pdf_email_optimizer input.pdf output.pdf --target-mb 7
Quick Start
# Ordinary email optimization
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7
# Preserve photos, screenshots, maps, and other detail
pdf-email-optimizer input.pdf output_email.pdf --target 7mb --quality
# Land inside a 5-7 MB range when possible
pdf-email-optimizer input.pdf output_email.pdf --range 5-7mb --quality
# Produce a Markdown report beside the output
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7 --report report.md
# Inspect without writing an optimized PDF
pdf-email-optimizer input.pdf --audit
The source PDF is never overwritten. Existing output files are rejected unless --force is supplied.
Profiles
| Profile | Use When | Behavior |
|---|---|---|
quality |
Photos, screenshots, maps, product images, "do not degrade" requests | High JPEG floor, protects small images, runs render QA, does not use Ghostscript by default |
balanced |
General email delivery | Moderate recompression ladder and conservative structural cleanup |
aggressive |
Smallest file matters more than perfect fidelity | Lower quality floor, smaller long-edge caps, optional Ghostscript fallback |
If quality mode cannot hit the requested size, the tool keeps the smallest quality-preserving output and emits a direct warning with next steps.
Output
Use --json for machine-readable summaries:
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --json
The JSON contract is documented in docs/json-output.md and validated by schema/output-summary.schema.json. Important fields include input/output size, target status, strategy, page count, private payload removals, image statistics, render QA, quality status, and warnings.
Gallery
Original page (left) vs. email copy (right). All inputs are synthetic, CC0 fixtures generated by benchmarks/make_fixtures.py; regenerate the images with python benchmarks/make_gallery.py.
InDesign-style export — 2.35 MB → 0.18 MB (92% smaller, PSNR 57.8 dB)
Scanned document — 0.73 MB → 0.25 MB (66% smaller)
Repeated images — 0.81 MB → 0.14 MB (83% smaller, lossless dedupe)
Benchmarks
The benchmark harness runs against the bundled redistributable fixtures:
python benchmarks/make_fixtures.py # (re)generate CC0 sample PDFs
python benchmarks/run_benchmarks.py --manifest benchmarks/benchmark_manifest.yaml --output benchmarks/results/latest.json
It writes JSON plus a Markdown table. Missing fixtures are marked as skipped so published results stay honest. The table below is generated output (benchmarks/results/latest.md); PSNR/RMS compare the optimized copy against the original render, and inf/0.0 denote a pixel-identical (lossless) result.
| Case | Input | Target | Profile | Output | Reduction | Target Hit | Worst PSNR | Worst RMS | Strategy |
|---|---|---|---|---|---|---|---|---|---|
| photo_brochure | 1.10 MB | 0.6 MB | quality | 1.10 MB | 0.1% | No | inf | 0.0 | pikepdf-structural |
| indesign_export | 2.35 MB | 1 MB | balanced | 0.18 MB | 92.3% | Yes | 57.822 | 0.327679 | image-recompress |
| illustrator_export | 0.01 MB | 7 MB | balanced | 0.01 MB | 18.6% | Yes | inf | 0.0 | structural-cleanup |
| private_payload_export | 0.16 MB | 7 MB | quality | 0.16 MB | 0.1% | Yes | inf | 0.0 | structural-cleanup |
| screenshot_report | 0.27 MB | 0.2 MB | quality | 0.09 MB | 66.4% | Yes | inf | 0.0 | structural-cleanup |
| text_vector_document | 0.00 MB | 7 MB | balanced | 0.00 MB | 12.2% | Yes | inf | 0.0 | structural-cleanup |
| scanned_pdf | 0.73 MB | 0.4 MB | balanced | 0.25 MB | 66.6% | Yes | inf | 0.0 | structural-cleanup |
| mixed_transparency | 1.75 MB | 1 MB | quality | 1.75 MB | -0.0% | No | inf | 0.0 | structural-cleanup |
| embedded_metadata | 0.12 MB | 7 MB | balanced | 0.12 MB | 0.1% | Yes | inf | 0.0 | structural-cleanup |
| repeated_images | 0.81 MB | 0.5 MB | balanced | 0.14 MB | 83.2% | Yes | inf | 0.0 | structural-cleanup |
| forms_annotations | 0.01 MB | 7 MB | quality | 0.01 MB | 3.9% | Yes | inf | 0.0 | structural-cleanup |
| encrypted_pdf | - | 7.0 MB | balanced | failed | - | - | - | - | Encrypted PDFs must be unlocked before optimization. |
The quality profile deliberately refuses to degrade photo_brochure and mixed_transparency below their targets, emitting a warning instead of shipping a blurry file.
See docs/benchmarking.md before adding fixtures.
Visual QA
Render and compare two PDFs:
pdf-email-render-compare original.pdf optimized.pdf --output-dir qa-renders
This reports page-level pixel differences and can write original, optimized, and amplified diff PNGs for review.
Agent Usage
The repo includes SKILL.md for agent runtimes that load local skills. The short version:
- Use
qualitywhen the user asks to preserve image fidelity. - Use
balancedfor ordinary email optimization. - Use
aggressiveonly when visible quality loss is acceptable. - Report size, target status, strategy, and warnings.
- Never overwrite the source PDF.
More examples are in docs/agent-usage.md.
Development
python -m pip install -e ".[dev]"
pytest
pytest --cov
ruff check .
python -m build
CI runs linting, tests, coverage, package build, and CLI smoke checks on Python 3.9-3.13.
Documentation
- Installation
- Examples
- Benchmarking
- Compatibility
- JSON output
- Agent usage
- Known limitations
- Troubleshooting
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_email_optimizer-1.0.0.tar.gz.
File metadata
- Download URL: pdf_email_optimizer-1.0.0.tar.gz
- Upload date:
- Size: 570.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dee7e299561d0d61b11d7ba052b483e7a499d9a10c04820f830db592e65fa8d6
|
|
| MD5 |
53ec0464c951c76772ca176346825d7e
|
|
| BLAKE2b-256 |
1efb23ee72058668c40601a8743e0f842741f0629b9def887a3c3fa6d7ec0a7d
|
Provenance
The following attestation bundles were made for pdf_email_optimizer-1.0.0.tar.gz:
Publisher:
publish.yaml on petehottelet/pdf-email-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf_email_optimizer-1.0.0.tar.gz -
Subject digest:
dee7e299561d0d61b11d7ba052b483e7a499d9a10c04820f830db592e65fa8d6 - Sigstore transparency entry: 1889357469
- Sigstore integration time:
-
Permalink:
petehottelet/pdf-email-optimizer@e3f7b98037c6fa52b811d07f89065fe262e50c65 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/petehottelet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@e3f7b98037c6fa52b811d07f89065fe262e50c65 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pdf_email_optimizer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pdf_email_optimizer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07d9599828fdcb1240c696284c587ed8e76b7403dc25c8feb55db751a00a93d5
|
|
| MD5 |
5299a2a2be363e9ad1b8a6c317a52c98
|
|
| BLAKE2b-256 |
c73b49596fef61516c9e7fad754128a8fcfd5421787b6150cdfb7ba03b201205
|
Provenance
The following attestation bundles were made for pdf_email_optimizer-1.0.0-py3-none-any.whl:
Publisher:
publish.yaml on petehottelet/pdf-email-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf_email_optimizer-1.0.0-py3-none-any.whl -
Subject digest:
07d9599828fdcb1240c696284c587ed8e76b7403dc25c8feb55db751a00a93d5 - Sigstore transparency entry: 1889357522
- Sigstore integration time:
-
Permalink:
petehottelet/pdf-email-optimizer@e3f7b98037c6fa52b811d07f89065fe262e50c65 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/petehottelet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@e3f7b98037c6fa52b811d07f89065fe262e50c65 -
Trigger Event:
release
-
Statement type: