Skip to main content

Image quality scoring for OCR readiness

Project description

Doc-Qual

Doc-Qual is a lightweight Python package for estimating whether a document image is ready for OCR before you spend time and money running OCR.

It computes a 0-100 OCR readiness score from interpretable image features:

Feature What it measures
Sharpness Laplacian variance for blur detection
Noise Gaussian residual noise estimate
Edges Canny edge density for text-like structure
Skew Hough-line based rotation estimate
Brightness Exposure and contrast balance
Ridges Hessian response for stroke-like structures

Installation

Install directly from GitHub (PyPI release coming soon):

pip install git+https://github.com/PradeepMadhavankutty/doc-qual.git

For development:

git clone https://github.com/PradeepMadhavankutty/doc-qual.git
cd doc-qual
pip install -e ".[dev]"

Python Usage

from doc_qual import compute_doc_qual_score

result = compute_doc_qual_score("scan.jpg", verbose=False)

print(result.ocr_score)
print(result.passed)
print(result.recommendations)

CLI Usage

doc-qual path/to/image.jpg
doc-qual path/to/image.jpg --format json
doc-qual path/to/image.jpg --threshold 60

When --threshold is provided, the CLI exits with code 1 if the image score is below the threshold. This makes it useful in CI and batch document-processing pipelines.

Current Status

This is an alpha implementation with expert-calibrated default weights. The long-term research direction is to calibrate feature weights empirically against OCR character error rate across datasets and engines.

Development

pytest
ruff check .
python -m build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_qual-0.1.0.tar.gz (7.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc_qual-0.1.0-py3-none-any.whl (7.5 MB view details)

Uploaded Python 3

File details

Details for the file doc_qual-0.1.0.tar.gz.

File metadata

  • Download URL: doc_qual-0.1.0.tar.gz
  • Upload date:
  • Size: 7.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for doc_qual-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fefe8a34d0288f2efb3d8ac29d039e6151784929e3991a91256c8e0024fe5105
MD5 79cff3de56a2ed3bbe3e14277269a0cb
BLAKE2b-256 683a5b6340df8f417ffd7b73e056408c522ffdbb1921cac25786307aaeee99a1

See more details on using hashes here.

File details

Details for the file doc_qual-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doc_qual-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for doc_qual-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 58a00f80e6914b0a85957a91b64bed80eca3749f39f458e15dfbbbfb9b012fc5
MD5 13bdacfcba7b3825248119b2535947bc
BLAKE2b-256 53b478446e498c2a2bc6610145dbd28ef1727b30902e5d702de387f3e23a7882

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page