Skip to main content

Compresses PDFs with PNG compression.

Project description

File Crusher

Compresses PDFs with PNG compression.

Tired of bumping against upload size limits? This tool is perfect to compress PDFs and PNGs by combining some of the best compression tools in one. While it can be slow, it really crushes your filesize and helps you to conquer the relentless 5MB upload limit.

It works by splitting up a PDF into PNGs and compress these with advpng, pngcrush and pngquant. Then it combines them back into a PDF and applies a round of lossless pdf compression. Optionally it can apply OCR - Optical Character Recognition to make a scanned PDF searchable. Additionally, it exposes internal processors enabling you to use it as png compressor and file converter.

Installation

1. Install the python library

pip install file-crusher

2. Install the Compression Tools

windows

already pre-installed in compressor_lib directory

Linux(ubuntu)

sudo apt install pngquant -y && sudo apt install advancecomp -y && sudo apt install pngcrush -y

and install wine for cpdfsqueeze

apt install wine -y

3. optionally Install pytesseract for OCR

For Windows via GUI

Download and Install Tesseract Select Additional Languages that you want. (f.e German under Additional Language Data)

Linux

apt install tesseract-ocr

add additional language packs

apt install tesseract-ocr-<language-shortform> -y

example for german

apt install tesseract-ocr-deu -y

Usage

CLI Usage

# for pdfs
python3 -m file_crusher input.pdf output.pdf --pdfcompressor
# or for pngs
python3 -m file_crusher input.png output.png --pngcompressor
# for other processors see
python3 -m file_crusher --help

Python Usage

from file_crusher import PNGCompressor, PDFCompressor

compressor = PNGCompressor()
compressor.process_file("input.png", "output.png")

# extreme mode
compressor = PNGCompressor(0)
compressor.process_file("input.png", "output.png")

# fast mode
compressor = PNGCompressor(5)
compressor.process_file("input.png", "output.png")

# also check the other options
compressor = PDFCompressor(default_pdf_dpi=200)
compressor.process_file("input.pdf", "output.pdf")

Disclaimer

It's important to note that lossy compression results in loss of quality or data. Therefore, it's always a good idea to test the output file to make sure it meets your requirements.

If you encounter any challenges while using the library or have suggestions for its improvement, I invite you to please create an issue. https://github.com/pIlIp-d/FileCrusher/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filecrusher-0.2.3.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

filecrusher-0.2.3-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file filecrusher-0.2.3.tar.gz.

File metadata

  • Download URL: filecrusher-0.2.3.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for filecrusher-0.2.3.tar.gz
Algorithm Hash digest
SHA256 bbf3bdfe4071486322868103d6317869e5b76926dd32ba945db956bd9089f0fc
MD5 e0d7ea42f554e9e7300bc87c8d4ec007
BLAKE2b-256 4c8c708d671ae5c746755c4dc6afda747ad23aec5dcafdddf77966c0f4949720

See more details on using hashes here.

File details

Details for the file filecrusher-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: filecrusher-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for filecrusher-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9c9e8e0a1026bef2e92bbe50f1d7baa5c8299972afb4c28bd4bfc11deffae11f
MD5 b574ee98b2c50768289a2125f9752cd3
BLAKE2b-256 be9bd6aa0fb26da8c0f0b212d462929200aecc5db50011a725e5ff8e0d498ca4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page