Skip to main content

Local-first PDF compressor: shrink PDFs below a target size without sending them to any third-party service.

Project description

slimpdf

PyPI Python License: MIT CI

Get a PDF under a size limit (say 3 MB for an upload form) without sending it to some website. slimpdf is a command-line tool and Python library that compresses PDFs entirely on your own machine — no uploads, no account, no network calls.

pip install slimpdf
slimpdf compress big-scan.pdf --target 3mb -o small.pdf

slimpdf demo — compress an 8.4 MB PDF to 484 KB fully offline

Why this exists

I kept hitting the same wall at work: a scanned hospital bill or KYC document would be 15–20 MB, the upload form capped at 3 MB, and the only quick fix was to drop the file into an online compressor. For medical and identity documents, handing them to a random website is exactly what you don't want to do.

Command-line options exist, but they have rough edges. Ghostscript's presets are a blunt instrument — /screen either over-compresses into mush or doesn't get small enough, and there's no "just get it under 3 MB" mode. It's also AGPL, so you can't bundle it into a product. slimpdf is the tool I wanted instead:

  • It runs locally. Your files never leave the machine. No telemetry either.
  • It aims at a size, not a vague preset. Tell it --target 3mb and it works out how much compression is actually needed, so it doesn't throw away quality it didn't have to.
  • It won't quietly wreck a file. Every result is re-opened and checked (page count, text still extractable) before it's kept, and if it can't beat the original it leaves the file alone instead of writing a bigger one.
  • MIT, with permissive dependencies only. No AGPL anywhere, so you can drop it into closed-source software without a lawyer conversation.

On a real set of scanned documents it held its own against Ghostscript — similar size reduction at noticeably better visual fidelity. Numbers and method are in docs/benchmark-results.md.

Install

pip install slimpdf
# or, from source with uv:
uv sync --extra dev

CLI

# Compress below 3 MB using the claim-upload preset
slimpdf compress input.pdf --target 3mb -o output.pdf --report report.json

# Inspect a PDF (read-only): pages, images, text layer, encryption
slimpdf inspect input.pdf --json

# Batch a folder and emit a benchmark table
slimpdf batch ./samples --target 3mb --out ./compressed --csv bench.csv --md bench.md

# Compare slimpdf against other engines on size AND quality (SSIM)
pip install "slimpdf[benchmark]"      # adds SSIM scoring
slimpdf compare ./samples --out ./out --min-mb 3 --csv compare.csv

compare picks up Ghostscript and mutool if they're on your PATH and scores each engine on size, visual fidelity (SSIM) and text retention, so you're not just trusting that smaller is better. There's a real run in docs/benchmark-results.md.

Exit code is 2 when a target was requested but not achieved — handy in scripts.

Presets

preset target max DPI quality (start→min) rasterize
claim-upload 3 MB 150 75 → 55 off
screen 120 70 → 45 off
archive 200 85 → 70 off

Override any knob: --max-dpi, --quality, --min-quality, --target, --allow-rasterize, --keep-metadata, --password, --force-output.

Python API

from slimpdf import compress, inspect, CompressOptions

info = inspect("input.pdf")
print(info.page_count, info.images_found, info.text_layer_detected)

result = compress(
    "input.pdf", "output.pdf",
    CompressOptions(preset="claim-upload", target_bytes=3_000_000),
)
print(result.compressed_size_bytes, result.target_achieved, result.mode_used)

How it works

There are three compression passes. slimpdf tries them least-destructive first and keeps the gentlest one that gets you under the target:

  1. Structural — recompress streams, pack objects, drop junk metadata. Fully lossless; sometimes enough on its own for a bloated-but-not-scanned PDF.
  2. Image rewrite — the main worker. Downsamples and re-encodes the oversized embedded images, binary-searching JPEG quality and DPI until it hits the target. Text, forms and vector graphics are left intact.
  3. Raster fallback — off unless you pass --allow-rasterize. Flattens each page to an image and rebuilds the PDF. It'll hit almost any size, but you lose selectable text, form fields and signatures, so it's a last resort.

Demo

The animation at the top (docs/demo.gif) is generated from a real slimpdf run on a synthetic file — regenerate it any time with:

uv run python docs/make_demo.py     # writes docs/demo.gif

(A VHS tape, docs/demo.tape, is also included if you prefer recording a live terminal.)

Notes

It really is offline. slimpdf makes no network calls and writes nothing outside the paths you give it. Safe to run on documents you wouldn't upload.

What if it can't hit the target? It returns the smallest valid result it found and reports target_achieved: false (CLI exit code 2). It never ships a file larger than the input unless you pass --force-output.

Scanned PDFs with no text layer compress on image content alone; there's nothing to preserve text-wise, so they tend to shrink the most.

Licensing

slimpdf is MIT. All runtime dependencies are permissive (no copyleft):

dependency license role
pikepdf MPL-2.0 parsing + structural rewrite (bundles qpdf, Apache-2.0)
pypdfium2 BSD-3/Apache page rendering (bundles PDFium, BSD-3)
Pillow MIT-CMU (HPND) image encode/decode

So you can use it inside a proprietary product without copyleft obligations. PyMuPDF and Ghostscript would both have been easier in places, but they're AGPL, which is the whole reason they're not here.

Status

Alpha — works well on the documents I've thrown at it, but it hasn't seen the long tail yet. Two things to know: any rewrite invalidates a PDF's digital signature (that's unavoidable when you change the bytes), and a few unusual image filters/colorspaces are skipped with a warning rather than risk producing a broken file. Bug reports with a sample PDF are very welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slimpdf-0.1.0.tar.gz (157.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slimpdf-0.1.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file slimpdf-0.1.0.tar.gz.

File metadata

  • Download URL: slimpdf-0.1.0.tar.gz
  • Upload date:
  • Size: 157.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for slimpdf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 077048688345c29a8cb43aebf22d8241d856381ea1831d63d0faedd692e0f45a
MD5 59a105bd325ee9da3d41b6d815d7143f
BLAKE2b-256 5e6ed40d71ff2d77f6608267ab8387351868dbcfedb2da727b77d4fc3e70f69b

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimpdf-0.1.0.tar.gz:

Publisher: release.yml on thisis-gp/slimpdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slimpdf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: slimpdf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for slimpdf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f1d7cd3115e4afb2e0724215a0c71f172d5a2a24378ef2a25ecf290e68acdd6
MD5 163549c86712c9dab8362656fa92dff4
BLAKE2b-256 ec99d87fb5cd44e279afacf67c60b0f3dbf09ef76909fe1604b740c8b9546316

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimpdf-0.1.0-py3-none-any.whl:

Publisher: release.yml on thisis-gp/slimpdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page