Skip to main content

Compresses PDFs with PNG compression.

Project description

File Crusher

Compresses PDFs with PNG compression.

Tired of bumping against upload size limits? This tool is perfect to compress PDFs and PNGs by combining some of the best compression tools in one. While it can be slow, it really crushes your filesize and helps you to conquer the relentless 5MB upload limit.

It works by splitting up a PDF into PNGs and compress these with advpng, pngcrush and pngquant. Then it combines them back into a PDF and applies a round of lossless pdf compression. Optionally it can apply OCR - Optical Character Recognition to make a scanned PDF searchable. Additionally, it exposes internal processors enabling you to use it as png compressor and file converter.

Installation

1. Install the python library

pip install file-crusher

2. Install the Compression Tools

windows

already pre-installed in compressor_lib directory

Linux(ubuntu)

sudo apt install pngquant -y && sudo apt install advancecomp -y && sudo apt install pngcrush -y

and install wine for cpdfsqueeze

apt install wine -y

3. optionally Install pytesseract for OCR

For Windows via GUI

Download and Install Tesseract Select Additional Languages that you want. (f.e German under Additional Language Data)

Linux

apt install tesseract-ocr

add additional language packs

apt install tesseract-ocr-<language-shortform> -y

example for german

apt install tesseract-ocr-deu -y

Usage

CLI Usage

# for pdfs
python3 -m file_crusher input.pdf output.pdf --pdfcompressor
# or for pngs
python3 -m file_crusher input.png output.png --pngcompressor
# for other processors see
python3 -m file_crusher --help

Python Usage

from file_crusher import PNGCompressor, PDFCompressor

compressor = PNGCompressor()
compressor.process_file("input.png", "output.png")

# extreme mode
compressor = PNGCompressor(0)
compressor.process_file("input.png", "output.png")

# fast mode
compressor = PNGCompressor(5)
compressor.process_file("input.png", "output.png")

# also check the other options
compressor = PDFCompressor(default_pdf_dpi=200)
compressor.process_file("input.pdf", "output.pdf")

Disclaimer

It's important to note that lossy compression results in loss of quality or data. Therefore, it's always a good idea to test the output file to make sure it meets your requirements.

If you encounter any challenges while using the library or have suggestions for its improvement, I invite you to please create an issue. https://github.com/pIlIp-d/FileCrusher/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filecrusher-0.2.5.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

filecrusher-0.2.5-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file filecrusher-0.2.5.tar.gz.

File metadata

  • Download URL: filecrusher-0.2.5.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.10

File hashes

Hashes for filecrusher-0.2.5.tar.gz
Algorithm Hash digest
SHA256 995fab718b3da92a7eeb1d50d9c771a39fdb8a5fc8163593da699a61213a5b2e
MD5 217456bb935959a186da3376a206b3d6
BLAKE2b-256 6c1856bbf426b4e3247adab3797625da134733e8bbfbc7b5f1fb0d6da5ac2296

See more details on using hashes here.

File details

Details for the file filecrusher-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: filecrusher-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.10

File hashes

Hashes for filecrusher-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c689af18d8ff1e9ea20b01d40472ad75348fb5f46fb023ca96161e4def58ef45
MD5 ab366a37f8594513059267967b6fa58d
BLAKE2b-256 4295c9f27c476ba2401c3e47fbb0dd1758a26003f3dd9ee46167e882cd57208c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page