Skip to main content

Desktop and CLI tool for verifying the technical quality of raster images embedded in PDF documents.

Project description

PDFIMGQ — PDF Image Quality Checker

A Python desktop application and command-line tool that extracts images embedded in PDF files and evaluates their print-readiness and visual quality indicators (e.g., DPI, shadow/highlight clipping, contrast, colorfulness).

Designed primarily for academic papers, theses, and professional publications to ensure all embedded figures and images meet standard quality thresholds.


Features

  • Graphical User Interface (GUI): A user-friendly desktop app for visual inspection and filtering.
  • Command Line Interface (CLI): Automate analysis and generate CSV reports for multiple PDFs.
  • Data Extraction: Automatically extracts and saves all images from PDFs in their original formats.
  • Comprehensive Indicators: Computes effective DPI, tonal clipping, contrast, and color richness.
  • Smart Recommendations: Provides non-blocking, heuristic-based tags (e.g., Low DPI, High Shadow Clip) to quickly spot problematic figures.

Installation

pdfimgq requires Python 3.11 or higher.

Install via PyPI (Recommended):

pip install pdfimgq

Install from the source archive

After extracting pdfimgq-source.zip, open a terminal in the extracted folder and run:

python -m pip install .

Usage

Graphical User Interface (GUI)

To launch the desktop application, run the following command in your terminal or command prompt:

pdfimgq --gui

Workflow:

  1. Click Select PDFs to load your documents.
  2. Select a PDF from the dropdown menu.
  3. Click Analyze to start the analysis.
  4. The table will populate with all detected images, their indicators, and recommendations. Click on any row to view the extracted image and detailed analysis.
  5. Output files (extracted images and CSVs) are saved by default to the PDF_Image_Quality_Outputs folder in your Documents directory.

Command Line Interface (CLI)

To run the tool directly from the terminal, use the pdfimgq command. Running pdfimgq without arguments prints help. For batch processing, pass --input and optionally --outdir.

# Check available options
pdfimgq --help

# Process a specific PDF and save results to a specific folder
pdfimgq --input ./MyThesis.pdf --outdir ./results

# Process all PDFs in a directory recursively
pdfimgq --input ./documents --outdir ./results --recursive

Output Files

For each analyzed PDF (e.g., MyThesis.pdf), the tool generates:

  • OUTPUT_DIR/MyThesis.csv — A detailed CSV report with one row per image draw call.
  • OUTPUT_DIR/MyThesis/ — A directory containing all physically extracted images.

Understanding the Indicators

1. Effective DPI

The image’s effective resolution based on its pixel dimensions and physical size on the PDF page. Higher DPI generally means a sharper print.

  • < 200 DPI: Often too soft for quality print (Low DPI).
  • 200–299 DPI: Borderline; usually fine on screen, acceptable but not ideal for print (Borderline DPI).
  • ≥ 300 DPI: Commonly recommended for printed theses and high-quality figures.

2. Shadow & Highlight Clip [%]

Percentage of pixels that are exactly pure black (0) or exactly pure white (255). High values may mean lost detail.

  • ≈ 0–0.5%: Very good; detail is preserved.
  • 0.5–5%: Moderate clipping (Moderate Shadow/Highlight Clip). Usually acceptable, but check if important details are lost.
  • > 5%: High clipping (High Shadow/Highlight Clip). Likely visible loss of detail.

3. Contrast (P1–P99) [%]

A simple contrast indicator based on the difference between the 1st and 99th percentile brightness levels, normalized to 0–100%.

  • < 5%: Image likely looks very flat (Low Contrast).
  • 5–10%: Close to the low-contrast threshold (Borderline Contrast).
  • ≥ 10%: Not flagged by this heuristic.

4. Colorfulness (HS)

An estimate of perceived colorfulness using the Hasler–Süsstrunk opponent-channel measure. Higher values indicate more vibrant colors; grayscale images are near zero.

  • < 15: Not/slightly colorful; essentially grayscale or very muted (Low Colorfulness).
  • 15–33: Moderately colorful; clearly some color, but not vivid (Limited Colorfulness).

5. Color Richness [%]

An estimate of overall color diversity (0–100%), computed as normalized Shannon entropy of a quantized RGB histogram. Lower values suggest fewer effectively used colors.

  • < 20%: Extremely limited palette, ≈ < 8 effective colors (Low Color Richness).
  • 20–33.3%: Limited palette, ≈ 8–32 effective colors (Limited Color Richness).
  • > 33.3%: Typical for most illustrations, photos, and visualizations.

Recommendations (Tag meanings)

These tags are friendly suggestions, not strict errors. Always consider the actual content and purpose of your image. For instance, a purposefully minimalist diagram will naturally flag as having limited color richness.

Tag Condition Note
Low DPI Under 200 in either dimension May print blurry or pixelated.
Borderline DPI 200–299 in either dimension Consider higher resolution for professional print.
Moderate Shadow Clip 0.5% – 5% Check dark areas for lost detail.
High Shadow Clip > 5% Significant shadow detail loss.
Moderate Highlight Clip 0.5% – 5% Check bright areas for blown-out pixels.
High Highlight Clip > 5% Significant highlight detail loss.
Low Contrast < 5% Image lacks depth/punch.
Borderline Contrast 5% – 10% Image may appear somewhat flat.
Low Colorfulness < 15 Image is nearly grayscale.
Limited Colorfulness 15 – 33 Modest color intensity.
Low Color Richness < 20% Extremely limited color palette.
Limited Color Richness 20% – 33.3% Limited color palette.
All Good None of the above No notable technical issues detected by heuristics.

Dependencies & License

This project relies on several open-source libraries, most notably PyMuPDF, which is dual-licensed under GNU AGPLv3 or a commercial license from Artifex. By using pdfimgq under its open-source license, you must comply with the AGPLv3 requirements.

pdfimgq itself is released under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later). See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfimgq-0.1.4.tar.gz (56.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfimgq-0.1.4-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file pdfimgq-0.1.4.tar.gz.

File metadata

  • Download URL: pdfimgq-0.1.4.tar.gz
  • Upload date:
  • Size: 56.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdfimgq-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ce205ea897926646f7a16616d2e80c4ade0838f268f39714969f5fa262b7d547
MD5 2104f7fa3649dc19b9fcfea2e503eb09
BLAKE2b-256 df62f28794f6d01000b06a233413b10826145c3ac565d3eff3bf46c364131b19

See more details on using hashes here.

File details

Details for the file pdfimgq-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pdfimgq-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdfimgq-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8fcccf319fd164730592eb127d793e4b0576fe5f8f70f29991878e005dfa9c06
MD5 a4bce205fee0a76f5d54414485d4364c
BLAKE2b-256 bdfe8fca0c277fc4a8d3b0d9ef4aec66c8d574c08f31f9a496e06690dfd87ce0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page