Document Color Palette Compression

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

noteshrunk - Document Color Palette Compression

This Python script compresses images by reducing the number of colors and optimizing the image representation. It leverages KMeans clustering for color quantization and offers various options to customize the compression process. All supplied images are then saved as a multi-page PDF.

The idea of the program is to optimize scanned documents. This is a complete and improved rewrite of mzucker's noteshrink.

Features

Color Quantization: Reduces the number of colors in the document using KMeans clustering, leading to smaller file sizes.
Background Detection and Removal: Identifies and removes the background color.
Customizable Palette: Allows you to specify the number of colors in the output palette and choose between a global palette for all pages or individual palettes for each page.
Color Control: Offers the option to maximize saturation in the output image as well as to remove the background (replace with white), enhancing visual clarity.
Denoising Options: Provides median filtering and morphological operations to reduce noise and improve image quality.

Requirements

Python 3
NumPy
Pillow (PIL Fork)
SciPy
scikit-learn
scikit-image

Optional

argcomplete (for command-line auto-completion)
Ghostscript (for PDF merging; otherwise you need to use the -k flag)

Installation

pipx install noteshrunk

Usage

python noteshrunk.py [-h] [-o OUTPUT] [-w] [-g] [-s] [-n N_COLORS] [-d DPI]
                    [-p PERCENTAGE] [-k] [-ts THRESHOLD_SATURATION]
                    [-tv THRESHOLD_VALUE] [--denoise_median] [--denoise_closing]
                    [--denoise_opening] [-ms MEDIAN_STRENGTH]
                    [-os OPENING_STRENGTH] [-cs CLOSING_STRENGTH] [-v] [-y]
                    files [files ...]

Arguments

files: The input image files (supports various formats like PNG, JPG, etc.).
-o, --output: Path to the output PDF file (default: output.pdf).
-w, --white_background: Use a white background instead of the dominant color.
-g, --global_palette: Use the same color palette for all images.
-s, --saturate: Maximize saturation in the output image.
-n, --n_colors: Number of colors in the palette (default: 8).
-d, --dpi: DPI value for the input and output images (default: 300).
-p, --percentage: Percentage of pixels to sample from each image for palette creation (default: 10).
-k, --keep_intermediate: Keep the intermediate single-page PDFs.
-ts, --threshold_saturation: HSV saturation threshold for background detection (default: 15).
-tv, --threshold_value: HSV value threshold for background detection (default: 25).
--denoise_median: Apply median filtering for denoising.
--denoise_closing: Apply morphological closing for denoising.
--denoise_opening: Apply morphological opening for denoising.
-ms, --median_strength: Strength of median filtering (default: 3).
-os, --opening_strength: Strength of opening filtering (default: 3).
-cs, --closing_strength: Strength of closing filtering (default: 3).
-v, --verbose: Enable verbose output.
-y, --overwrite: Overwrite existing files without prompting.

Examples

Compress a single image with default settings:
```
noteshrunk input.png
```
Compress multiple images with a white background and 16 colors:
```
noteshrunk -w -n 16 image1.jpg image2.png
```
Compress images using a global palette and keep intermediate files:
```
noteshrunk -g -k *.jpg
```

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests on the GitHub repository.

Acknowledgements

This project utilizes open-source software from the Python community. Special thanks to the developers and maintainers of the required libraries as well as mzucker's initial program.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.6.0

Apr 23, 2024

1.5.1

Apr 21, 2024

1.5.0

Apr 21, 2024

1.4.2

Apr 20, 2024

1.4.1

Apr 20, 2024

1.4.0

Apr 20, 2024

1.3.0

Apr 15, 2024

1.2.0

Apr 13, 2024

1.1.1

Apr 11, 2024

1.1.0 yanked

Apr 11, 2024

1.0.2

Apr 11, 2024

This version

1.0.1

Apr 11, 2024

1.0.0

Apr 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noteshrunk-1.0.1.tar.gz (852.0 kB view hashes)

Uploaded Apr 11, 2024 Source

Built Distribution

noteshrunk-1.0.1-py3-none-any.whl (42.2 kB view hashes)

Uploaded Apr 11, 2024 Python 3

Hashes for noteshrunk-1.0.1.tar.gz

Hashes for noteshrunk-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f26fbb34df6247e3701051f690ebb336a6091123d2d99ea571d411d44176f9a9`
MD5	`cf1863d00827367e6d447a2d729e076d`
BLAKE2b-256	`ed5f14b7f03ac85d7d3e3310026168b1cac6fcf0ba850a6771d4ea96bcd31e3b`

Hashes for noteshrunk-1.0.1-py3-none-any.whl

Hashes for noteshrunk-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`15192534c9472b3889136e8ea7d84456692cb48be81f71e893280baf968fde83`
MD5	`092bbdb7317a9cda542e25ea878e6c0e`
BLAKE2b-256	`42b8c8e8e23f7e84af95e79fb899f155c5c9ce016d6e0698e09cd54f55c359b5`