Skip to main content

Desktop and CLI tool for verifying the technical quality of raster images embedded in PDF documents.

Project description

PDFIMGQ — PDF Image Quality Checker

A Python desktop application and command-line tool that extracts images embedded in PDF files and evaluates their print-readiness and visual quality metrics (e.g., DPI, shadow/highlight clipping, contrast, colorfulness).

Designed primarily for academic papers, theses, and professional publications to ensure all embedded figures and images meet standard quality thresholds.


Features

  • Graphical User Interface (GUI): A user-friendly desktop app for visual inspection and filtering.
  • Command Line Interface (CLI): Automate analysis and generate CSV reports for multiple PDFs.
  • Data Extraction: Automatically extracts and saves all images from PDFs in their original formats.
  • Comprehensive Metrics: Computes effective DPI, tonal clipping, contrast, and color richness.
  • Smart Recommendations: Provides non-blocking, heuristic-based tags (e.g., Low DPI, High Shadow Clip) to quickly spot problematic figures.

Installation

pdfimgq requires Python 3.11 or higher.

Install via PyPI (Recommended):

pip install pdfimgq

Usage

Graphical User Interface (GUI)

To launch the desktop application, run the following command in your terminal or command prompt:

pdfimgq --gui

Workflow:

  1. Click Select PDFs to load your documents.
  2. Select a PDF from the dropdown menu.
  3. The table will populate with all detected images, their metrics, and recommendations. Click on any row to view the extracted image and detailed analysis.
  4. Output files (extracted images and CSVs) are saved by default to the PDF_Image_Quality_Outputs folder in your Documents directory.

Command Line Interface (CLI)

To run the tool directly from the terminal, use the pdfimgq command. Running pdfimgq without arguments prints help. For batch processing, pass --input and optionally --outdir.

# Check available options
pdfimgq --help

# Process a specific PDF and save results to a specific folder
pdfimgq --input ./MyThesis.pdf --outdir ./results

# Process all PDFs in a directory recursively
pdfimgq --input ./documents --outdir ./results --recursive

Output Files

For each analyzed PDF (e.g., MyThesis.pdf), the tool generates:

  • OUTPUT_DIR/MyThesis.csv — A detailed CSV report with one row per image draw call.
  • OUTPUT_DIR/MyThesis/ — A directory containing all physically extracted images.

Understanding the Metrics

1. Effective DPI

The image’s effective resolution based on its pixel dimensions and physical size on the PDF page. Higher DPI generally means a sharper print.

  • < 200 DPI: Often too soft for quality print (Low DPI).
  • 200–299 DPI: Borderline; usually fine on screen, acceptable but not ideal for print (Borderline DPI).
  • ≥ 300 DPI: Commonly recommended for printed theses and high-quality figures.

2. Shadow & Highlight Clip [%]

Percentage of pixels that are exactly pure black (0) or exactly pure white (255). High values may mean lost detail.

  • ≈ 0–0.5%: Very good; detail is preserved.
  • 0.5–5%: Moderate clipping (Moderate Shadow/Highlight Clip). Usually acceptable, but check if important details are lost.
  • > 5%: High clipping (High Shadow/Highlight Clip). Likely visible loss of detail.

3. Contrast (P1–P99) [%]

A simple contrast indicator based on the difference between the 1st and 99th percentile brightness levels, normalized to 0–100%.

  • < 5%: Image likely looks very flat (Low Contrast).
  • 5–10%: Close to the low-contrast threshold (Borderline Contrast).
  • ≥ 10%: Not flagged by this heuristic.

4. Colorfulness (HS)

An estimate of perceived colorfulness using the Hasler–Süsstrunk opponent-channel measure. Higher values indicate more vibrant colors; grayscale images are near zero.

  • < 15: Not/slightly colorful; essentially grayscale or very muted (Low Colorfulness).
  • 15–33: Moderately colorful; clearly some color, but not vivid (Limited Colorfulness).

5. Color Richness [%]

An estimate of overall color diversity (0–100%), computed as normalized Shannon entropy of a quantized RGB histogram. Lower values suggest fewer effectively used colors.

  • < 20%: Extremely limited palette, ≈ < 8 effective colors (Low Color Richness).
  • 20–33.3%: Limited palette, ≈ 8–32 effective colors (Limited Color Richness).
  • > 33.3%: Typical for most illustrations, photos, and visualizations.

Recommendations (Tag meanings)

These tags are friendly suggestions, not strict errors. Always consider the actual content and purpose of your image. For instance, a purposefully minimalist diagram will naturally flag as having limited color richness.

Tag Condition Note
Low DPI Under 200 in either dimension May print blurry or pixelated.
Borderline DPI 200–299 in either dimension Consider higher resolution for professional print.
Moderate Shadow Clip 0.5% – 5% Check dark areas for lost detail.
High Shadow Clip > 5% Significant shadow detail loss.
Moderate Highlight Clip 0.5% – 5% Check bright areas for blown-out pixels.
High Highlight Clip > 5% Significant highlight detail loss.
Low Contrast < 5% Image lacks depth/punch.
Borderline Contrast 5% – 10% Image may appear somewhat flat.
Low Colorfulness < 15 Image is nearly grayscale.
Limited Colorfulness 15 – 33 Modest color intensity.
Low Color Richness < 20% Extremely limited color palette.
Limited Color Richness 20% – 33.3% Limited color palette.
All Good None of the above No notable technical issues detected by heuristics.

Dependencies & License

This project relies on several open-source libraries, most notably PyMuPDF, which is dual-licensed under GNU AGPLv3 or a commercial license from Artifex. By using pdfimgq under its open-source license, you must comply with the AGPLv3 requirements.

pdfimgq itself is released under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfimgq-0.1.3.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfimgq-0.1.3-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file pdfimgq-0.1.3.tar.gz.

File metadata

  • Download URL: pdfimgq-0.1.3.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdfimgq-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3cf8fffeb7623831af56399d1d4893ef0edbddac91614b1609b64041e208348c
MD5 03a2d2d2d498f313eb329a40ddccde97
BLAKE2b-256 e12910bcfb85feb645a51679a0c026dfd66f6f6e4e658fe7a10a88390684ec1a

See more details on using hashes here.

File details

Details for the file pdfimgq-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pdfimgq-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdfimgq-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3b2aeee3e7203b3d16d71e1f0246a6333a8e3b2478c80ed60f486667550c558a
MD5 ad339c34bc0b7bc4c68362a60968e632
BLAKE2b-256 73c03c373acbc1e70734e37ca444ef18f5e00b1a4e13627e5442776c3b9291d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page