Image quality scoring for OCR readiness
Project description
Doc-Qual
Doc-Qual is a lightweight Python package for estimating whether a document image is ready for OCR before you spend time and money running OCR.
It computes a 0-100 OCR readiness score from interpretable image features:
| Feature | What it measures |
|---|---|
| Sharpness | Laplacian variance for blur detection |
| Noise | Gaussian residual noise estimate |
| Edges | Canny edge density for text-like structure |
| Skew | Hough-line based rotation estimate |
| Brightness | Exposure and contrast balance |
| Ridges | Hessian response for stroke-like structures |
Installation
Install directly from GitHub (PyPI release coming soon):
pip install git+https://github.com/PradeepMadhavankutty/doc-qual.git
For development:
git clone https://github.com/PradeepMadhavankutty/doc-qual.git
cd doc-qual
pip install -e ".[dev]"
Python Usage
from doc_qual import compute_doc_qual_score
result = compute_doc_qual_score("scan.jpg", verbose=False)
print(result.ocr_score)
print(result.passed)
print(result.recommendations)
CLI Usage
doc-qual path/to/image.jpg
doc-qual path/to/image.jpg --format json
doc-qual path/to/image.jpg --threshold 60
When --threshold is provided, the CLI exits with code 1 if the image score is below the threshold. This makes it useful in CI and batch document-processing pipelines.
Current Status
This is an alpha implementation with expert-calibrated default weights. The long-term research direction is to calibrate feature weights empirically against OCR character error rate across datasets and engines.
Development
pytest
ruff check .
python -m build
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doc_qual-0.1.0.tar.gz.
File metadata
- Download URL: doc_qual-0.1.0.tar.gz
- Upload date:
- Size: 7.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fefe8a34d0288f2efb3d8ac29d039e6151784929e3991a91256c8e0024fe5105
|
|
| MD5 |
79cff3de56a2ed3bbe3e14277269a0cb
|
|
| BLAKE2b-256 |
683a5b6340df8f417ffd7b73e056408c522ffdbb1921cac25786307aaeee99a1
|
File details
Details for the file doc_qual-0.1.0-py3-none-any.whl.
File metadata
- Download URL: doc_qual-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58a00f80e6914b0a85957a91b64bed80eca3749f39f458e15dfbbbfb9b012fc5
|
|
| MD5 |
13bdacfcba7b3825248119b2535947bc
|
|
| BLAKE2b-256 |
53b478446e498c2a2bc6610145dbd28ef1727b30902e5d702de387f3e23a7882
|