Desktop and CLI tool for verifying the technical quality of raster images embedded in PDF documents.
Project description
PDFIMGQ — PDF Image Quality Checker
A Python desktop application and command-line tool that extracts images embedded in PDF files and evaluates their print-readiness and visual quality metrics (e.g., DPI, shadow/highlight clipping, contrast, colorfulness).
Designed primarily for academic papers, theses, and professional publications to ensure all embedded figures and images meet standard quality thresholds.
Features
- Graphical User Interface (GUI): A user-friendly desktop app for visual inspection and filtering.
- Command Line Interface (CLI): Automate analysis and generate CSV reports for multiple PDFs.
- Data Extraction: Automatically extracts and saves all images from PDFs in their original formats.
- Comprehensive Metrics: Computes effective DPI, tonal clipping, contrast, and color richness.
- Smart Recommendations: Provides non-blocking, heuristic-based tags (e.g., Low DPI, High Shadow Clip) to quickly spot problematic figures.
Installation
pdfimgq requires Python 3.11 or higher.
Install via PyPI (Recommended):
pip install pdfimgq
Usage
Graphical User Interface (GUI)
To launch the desktop application, run the following command in your terminal or command prompt:
pdfimgq --gui
Workflow:
- Click Select PDFs to load your documents.
- Select a PDF from the dropdown menu.
- The table will populate with all detected images, their metrics, and recommendations. Click on any row to view the extracted image and detailed analysis.
- Output files (extracted images and CSVs) are saved by default to the
PDF_Image_Quality_Outputsfolder in yourDocumentsdirectory.
Command Line Interface (CLI)
To run the tool directly from the terminal, use the pdfimgq command. Running pdfimgq without arguments prints help. For batch processing, pass --input and optionally --outdir.
# Check available options
pdfimgq --help
# Process a specific PDF and save results to a specific folder
pdfimgq --input ./MyThesis.pdf --outdir ./results
# Process all PDFs in a directory recursively
pdfimgq --input ./documents --outdir ./results --recursive
Output Files
For each analyzed PDF (e.g., MyThesis.pdf), the tool generates:
OUTPUT_DIR/MyThesis.csv— A detailed CSV report with one row per image draw call.OUTPUT_DIR/MyThesis/— A directory containing all physically extracted images.
Understanding the Metrics
1. Effective DPI
The image’s effective resolution based on its pixel dimensions and physical size on the PDF page. Higher DPI generally means a sharper print.
- < 200 DPI: Often too soft for quality print (
Low DPI). - 200–299 DPI: Borderline; usually fine on screen, acceptable but not ideal for print (
Borderline DPI). - ≥ 300 DPI: Commonly recommended for printed theses and high-quality figures.
2. Shadow & Highlight Clip [%]
Percentage of pixels that are exactly pure black (0) or exactly pure white (255). High values may mean lost detail.
- ≈ 0–0.5%: Very good; detail is preserved.
- 0.5–5%: Moderate clipping (
Moderate Shadow/Highlight Clip). Usually acceptable, but check if important details are lost. - > 5%: High clipping (
High Shadow/Highlight Clip). Likely visible loss of detail.
3. Contrast (P1–P99) [%]
A simple contrast indicator based on the difference between the 1st and 99th percentile brightness levels, normalized to 0–100%.
- < 5%: Image likely looks very flat (
Low Contrast). - 5–10%: Close to the low-contrast threshold (
Borderline Contrast). - ≥ 10%: Not flagged by this heuristic.
4. Colorfulness (HS)
An estimate of perceived colorfulness using the Hasler–Süsstrunk opponent-channel measure. Higher values indicate more vibrant colors; grayscale images are near zero.
- < 15: Not/slightly colorful; essentially grayscale or very muted (
Low Colorfulness). - 15–33: Moderately colorful; clearly some color, but not vivid (
Limited Colorfulness).
5. Color Richness [%]
An estimate of overall color diversity (0–100%), computed as normalized Shannon entropy of a quantized RGB histogram. Lower values suggest fewer effectively used colors.
- < 20%: Extremely limited palette, ≈ < 8 effective colors (
Low Color Richness). - 20–33.3%: Limited palette, ≈ 8–32 effective colors (
Limited Color Richness). - > 33.3%: Typical for most illustrations, photos, and visualizations.
Recommendations (Tag meanings)
These tags are friendly suggestions, not strict errors. Always consider the actual content and purpose of your image. For instance, a purposefully minimalist diagram will naturally flag as having limited color richness.
| Tag | Condition | Note |
|---|---|---|
| Low DPI | Under 200 in either dimension | May print blurry or pixelated. |
| Borderline DPI | 200–299 in either dimension | Consider higher resolution for professional print. |
| Moderate Shadow Clip | 0.5% – 5% | Check dark areas for lost detail. |
| High Shadow Clip | > 5% | Significant shadow detail loss. |
| Moderate Highlight Clip | 0.5% – 5% | Check bright areas for blown-out pixels. |
| High Highlight Clip | > 5% | Significant highlight detail loss. |
| Low Contrast | < 5% | Image lacks depth/punch. |
| Borderline Contrast | 5% – 10% | Image may appear somewhat flat. |
| Low Colorfulness | < 15 | Image is nearly grayscale. |
| Limited Colorfulness | 15 – 33 | Modest color intensity. |
| Low Color Richness | < 20% | Extremely limited color palette. |
| Limited Color Richness | 20% – 33.3% | Limited color palette. |
| All Good | None of the above | No notable technical issues detected by heuristics. |
Dependencies & License
This project relies on several open-source libraries, most notably PyMuPDF, which is dual-licensed under GNU AGPLv3 or a commercial license from Artifex. By using pdfimgq under its open-source license, you must comply with the AGPLv3 requirements.
pdfimgq itself is released under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfimgq-0.1.1.tar.gz.
File metadata
- Download URL: pdfimgq-0.1.1.tar.gz
- Upload date:
- Size: 53.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeb367e0f6bdc26887eeff303ea940401df0b308e75859b74c994046be143acf
|
|
| MD5 |
99d232a26856d446d6c4a3f625aa674c
|
|
| BLAKE2b-256 |
acdac17b7f3c37c1a95d6eeff01861cf4f04bc9b700ef8fe41c2a10e0cd1c8e2
|
File details
Details for the file pdfimgq-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pdfimgq-0.1.1-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97150bc6f8f2c4b1207d9523a1a44d3d4e2a6b5447a45c949cd9abbce5d1755c
|
|
| MD5 |
87e183cc169a2dcbedb5b2686c4258d5
|
|
| BLAKE2b-256 |
ac3d89743d58db45631b00e7c18a8feabe8d21d94850cb4cd8d842277e25cd13
|