pdf-email-optimizer

Shrink PDFs to email-safe sizes while preserving visual quality — CLI plus Claude and Codex agent skill.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

petehottelet

These details have not been verified by PyPI

Project description

PDF Email Optimizer

Input formats Output formats

Optimize PDFs for email-safe sizes while preserving visual quality — available as a command-line tool and as a Claude and Codex agent skill. Reduce file sizes while maintaining image quality and appearance.

PDF Email Optimizer is built for posters, brochures, reports, photo-heavy decks, and design-tool exports (Illustrator, InDesign) that need to fit under a target like 5-7 MB. It starts with structural cleanup, recompresses images only when needed, and reports when a requested size conflicts with visual quality. Agents load it via SKILL.md (Claude) and agents/openai.yaml (Codex).

Optimizing for fax instead of email? The sister project pdf-fax-optimizer targets fax-machine constraints (bilevel rendering, TIFF/G4 output, page-size discipline) rather than email size and visual fidelity.

Real-world results

Eight real documents — two PowerPoint decks starting from .pptx, two image-heavy PDFs, two government technical reports, and two archival document scans from 1976 — run end-to-end through the optimizer. Numbers are emitted by benchmarks/run_samples.py; the chart and gallery come from benchmarks/make_charts.py and benchmarks/make_gallery.py.

Real-world filesize reduction: original document vs email-safe PDF

Sample	Original	Email PDF	Reduction	PSNR
Photo brochure	138.74 MB `.pdf`	6.51 MB	95.3%	48.6 dB
Archival scan, 1976 (B)	88.68 MB `.pdf`	23.80 MB	73.2%	32.5 dB
Lossless image PDF	69.65 MB `.pdf`	2.93 MB	95.8%	54.6 dB
Financial services proposal	36.31 MB `.pptx`	4.97 MB	86.3%	41.3 dB
Archival scan, 1976 (A)	33.04 MB `.pdf`	20.58 MB	37.7%	∞ (lossless)
Bank report	30.16 MB `.pptx`	7.41 MB	75.5%	38.7 dB
Government report (2017)	12.69 MB `.pdf`	6.86 MB	45.9%	46.9 dB
Research paper (2024)	9.57 MB `.pdf`	6.59 MB	31.1%	38.8 dB

Average reduction across all eight: 67.6%. The headline samples (photo brochure, lossless image PDF, financial proposal, bank report) all land under 8 MB, in the Gmail-attachable range, and all clear or sit right at the PSNR 40 dB "visually indistinguishable" threshold. The two archival 1976 NASA scans are the honest end of the spectrum: dense raster pages from a film-scan workflow, with little structural fat. The 606-page scan (1976 A) rewrites lossless (PSNR ∞) and still drops from 33 MB to 20.58 MB; the 192-page scan (1976 B) goes from 89 MB to 23.80 MB at PSNR 32.5 dB (visible compression but legible at email zoom) — both now fit under Gmail's 25 MB attachment limit, where neither did before. The modern government report and research paper both clear 7 MB cleanly.

For PowerPoint and Excel starting points, the conversion is one command:

python benchmarks/convert_samples.py    # LibreOffice headless: .pptx/.xlsx -> .pdf
pdf-email-optimizer "Financial_Services_Proposal.pdf" "Financial_Services_Proposal_email.pdf" \
    --target-mb 5 --balanced --long-edge 2000 --image-quality 82

See the Gallery for before/after/diff renders and docs/comparisons.md for a side-by-side against Ghostscript and pikepdf-only on the same PDF.

Install

From a checkout:

python -m pip install -e ".[dev]"
pdf-email-optimizer --help

Once published to a package index:

pipx install pdf-email-optimizer
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --profile quality

Also supported:

uvx pdf-email-optimizer input.pdf output.pdf --target 7mb
python -m pdf_email_optimizer input.pdf output.pdf --target-mb 7

Quick Start

# Ordinary email optimization
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7

# Preserve photos, screenshots, maps, and other detail
pdf-email-optimizer input.pdf output_email.pdf --target 7mb --quality

# Land inside a 5-7 MB range when possible
pdf-email-optimizer input.pdf output_email.pdf --range 5-7mb --quality

# Produce a Markdown report beside the output
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7 --report report.md

# Inspect without writing an optimized PDF
pdf-email-optimizer input.pdf --audit

The source PDF is never overwritten. Existing output files are rejected unless --force is supplied.

Profiles

Profile	Use When	Behavior
`quality`	Photos, screenshots, maps, product images, "do not degrade" requests	High JPEG floor, protects small images, runs render QA, does not use Ghostscript by default
`balanced`	General email delivery	Moderate recompression ladder and conservative structural cleanup
`aggressive`	Smallest file matters more than perfect fidelity	Lower quality floor, smaller long-edge caps, optional Ghostscript fallback

If quality mode cannot hit the requested size, the tool keeps the smallest quality-preserving output and emits a direct warning with next steps.

Output

Use --json for machine-readable summaries:

pdf-email-optimizer input.pdf output.pdf --target-mb 7 --json

The JSON contract is documented in docs/json-output.md and validated by schema/output-summary.schema.json. Important fields include input/output size, target status, strategy, page count, creator metadata cleanup, image statistics, render QA, quality status, and warnings.

Gallery

Before / after pairs from the real-world sample suite. Numbers match the Real-world results table. The right-hand image is the optimized "email copy" rendered at the same resolution as the original.

Photo brochure — 138.74 MB .pdf → 6.51 MB email PDF (95.3% smaller, PSNR 48.6 dB)

Photo brochure before and after

Lossless image PDF — 69.65 MB .pdf → 2.93 MB email PDF (95.8% smaller, PSNR 54.6 dB)

Lossless image PDF before and after

Financial services proposal — 36.31 MB .pptx → 4.97 MB email PDF (86.3% smaller, PSNR 41.3 dB)

Financial services proposal before and after

Bank report — 30.16 MB .pptx → 7.41 MB email PDF (75.5% smaller, PSNR 38.7 dB)

Bank report before and after

PSNR ≥ 40 dB is visually indistinguishable; the optimizer holds every passing sample at or above that. Per-sample _before.png, _after.png, and _diff.png files live under docs/gallery/. The amplified diff is at 8x so even sub-pixel differences are visible — if it looks black, the change is invisible at normal zoom.

Synthetic brochure renders (built from CC0 stock images, no real people / places / trademarks — see benchmarks/demo_assets/PROVENANCE.md) are kept under docs/gallery/ as well; regenerate them with:

python benchmarks/make_demo_brochures.py   # build large CC0 source brochures (~10-14 MB each)
python benchmarks/make_demo_gallery.py     # optimize + render the before/after images

Smaller, fully synthetic fixtures (generated by benchmarks/make_fixtures.py, rendered by benchmarks/make_gallery.py) drive the regression suite below. To rebuild the real-world gallery and charts from scratch:

python benchmarks/convert_samples.py       # .pptx/.xlsx -> .pdf via LibreOffice
python benchmarks/run_samples.py           # optimize, write benchmarks/results/samples.json
python benchmarks/make_gallery.py          # before / after / diff PNGs
python benchmarks/make_charts.py           # RGBY-on-dark vertical bar chart (linear MB)

Regression suite

Eleven synthetic CC0 fixtures (each ≤ 2 MB) exercise specific shapes of PDF that real optimizers handle badly: duplicate-image PDFs, vector-only exports, scans, screenshots, forms, transparency, embedded metadata, and PowerPoint/InDesign exports. They're regression coverage for behavior, not magnitude — they ensure every release still chooses the right strategy per shape, still respects the quality floor, and never silently degrades a file that doesn't need it. Real-world headline numbers belong in Real-world results above.

python benchmarks/make_fixtures.py        # (re)generate CC0 sample PDFs
python benchmarks/run_benchmarks.py       # writes JSON, CSV, and Markdown

The full per-fixture table (input, optimized size, reduction, PSNR) is committed at benchmarks/results/latest.md and regenerated on every CI run; see docs/benchmarking.md before adding new fixtures.

How it compares

The same PDF, run through each tool, gives very different shapes of output:

Tool	Output	Reduction	Worst PSNR	Notes
pdf-email-optimizer (`--quality`)	3.48 MB	95.0%	55.8 dB	Visually lossless, hits target
pdf-email-optimizer (`--balanced`)	2.93 MB	95.8%	54.6 dB	Visually lossless, hits target
pdf-email-optimizer (`--aggressive`)	2.71 MB	96.1%	54.0 dB	Visually lossless, hits target
Ghostscript `/printer`	1.29 MB	98.2%	34.5 dB	Visible degradation, no quality floor
Ghostscript `/ebook`	0.29 MB	99.6%	31.6 dB	Severely degraded
Ghostscript `/screen`	0.12 MB	99.8%	27.2 dB	Severely degraded
pikepdf-only (lossless)	53.90 MB	22.6%	∞	Pixel-identical, but doesn't hit target

Source: 69.65 MB lossless image PDF, target 7 MB. Full table, methodology, and exact reproduction commands in docs/comparisons.md. Regenerate with python benchmarks/run_comparisons.py --source <pdf> --target-mb 7.

Visual QA

Render and compare two PDFs:

pdf-email-render-compare original.pdf optimized.pdf --output-dir qa-renders

This reports page-level pixel differences and can write original, optimized, and amplified diff PNGs for review.

Agent Usage

The repo includes SKILL.md for agent runtimes that load local skills. The short version:

Use quality when the user asks to preserve image fidelity.
Use balanced for ordinary email optimization.
Use aggressive only when visible quality loss is acceptable.
Report size, target status, strategy, and warnings.
Never overwrite the source PDF.

More examples are in docs/agent-usage.md.

Development

python -m pip install -e ".[dev]"
pytest
pytest --cov
ruff check .
python -m build

CI runs linting, tests, coverage, package build, and CLI smoke checks on Python 3.9-3.13.

Documentation

Related projects

pdf-fax-optimizer — sister project for shrinking PDFs to fax-machine constraints (bilevel rendering, TIFF/G4 output, page-size discipline) rather than email size and visual fidelity.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

petehottelet

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.0

Jun 22, 2026

2.0.0

Jun 21, 2026

1.7.0

Jun 21, 2026

1.6.0

Jun 21, 2026

This version

1.5.0

Jun 21, 2026

1.0.0

Jun 21, 2026

0.1.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_email_optimizer-1.5.0.tar.gz (609.5 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf_email_optimizer-1.5.0-py3-none-any.whl (24.5 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file pdf_email_optimizer-1.5.0.tar.gz.

File metadata

Download URL: pdf_email_optimizer-1.5.0.tar.gz
Upload date: Jun 21, 2026
Size: 609.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf_email_optimizer-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`094b2c19a3a45155c3d34e516728b3e763fcbb29c1b332f7bc749bddf8f20eba`
MD5	`48d02744c23e60daa1e36eba8d9734ad`
BLAKE2b-256	`265ff9e6f3c9c3d76bf97c22c3e7ff8374e1729c99042b00bf28762bdce37a57`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_email_optimizer-1.5.0.tar.gz:

Publisher: publish.yaml on petehottelet/pdf-email-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdf_email_optimizer-1.5.0.tar.gz
- Subject digest: 094b2c19a3a45155c3d34e516728b3e763fcbb29c1b332f7bc749bddf8f20eba
- Sigstore transparency entry: 1894762774
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: petehottelet/pdf-email-optimizer@fa15b688b8e4bbd31533939e27a23053d5bb576b
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/petehottelet
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@fa15b688b8e4bbd31533939e27a23053d5bb576b
- Trigger Event: push

File details

Details for the file pdf_email_optimizer-1.5.0-py3-none-any.whl.

File metadata

Download URL: pdf_email_optimizer-1.5.0-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 24.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf_email_optimizer-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f0104ac95115ee94cf84aeda268586f7aab6c2bc44cbe9820c98e1a6f1cd8366`
MD5	`4322cceee4e3d40adc5a3e9a16420ec3`
BLAKE2b-256	`ce9cd01b6eef07a87687db82218b03537cd01d2f41adee97d80d08ddbcd31507`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_email_optimizer-1.5.0-py3-none-any.whl:

Publisher: publish.yaml on petehottelet/pdf-email-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdf_email_optimizer-1.5.0-py3-none-any.whl
- Subject digest: f0104ac95115ee94cf84aeda268586f7aab6c2bc44cbe9820c98e1a6f1cd8366
- Sigstore transparency entry: 1894762828
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: petehottelet/pdf-email-optimizer@fa15b688b8e4bbd31533939e27a23053d5bb576b
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/petehottelet
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@fa15b688b8e4bbd31533939e27a23053d5bb576b
- Trigger Event: push

pdf-email-optimizer 1.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PDF Email Optimizer

Real-world results

Install

Quick Start

Profiles

Output

Gallery

Regression suite

How it compares

Visual QA

Agent Usage

Development

Documentation

Related projects

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance