Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Image cleaning and OCR improvement package in Python using OpenCV.

Project Description

# Athento-imaging

Athento-Imaging is a package developed using Python and OpenCV to improve OCR in documents. Among the documents tested using this package are: passports, bills, delivery notes, budgets, and other common documents.

This package includes several functions to transform images:

  • Remove coloured background.
  • Remove “salt and pepper” noise.
  • Line detection in documents (two approachs).
  • Remove lines in documents.
  • Simple line analysis (which lines are horizontal and vertical, distance between lines, etc.
  • Template matching improved using pyramid transformations.

You can check everything out here: [Athento-Imaging Summary](<docs/SUMMARY.md>)

The quality of the output and it’s OCR performance will depend on:

  • The quality of the source document, as the quality value increases so does the OCR.
  • The amount of noise in the document and where it’s located.
  • The location of the document’s watermarks (if any).
  • The colour of the document. Clear colours are easier to remove than darker colours due to the proximity of the pixel values between the background and the text.
  • Your personal experience in image transformation. As you might need to perform a combination of operations or change the parameters values significantly.
Release History

Release History

This version
History Node

0.1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
athentoimaging-0.1.tar.gz (400.4 kB) Copy SHA256 Checksum SHA256 Source Apr 21, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting