Skip to main content

Meaningful Optical Character Recognition from identity cards with Deep Learning.

Project description

mocr

https://github.com/verifid/mocr/workflows/mocr%20ci/badge.svg https://img.shields.io/pypi/v/mocr.svg https://img.shields.io/pypi/pyversions/mocr.svg https://travis-ci.org/verifid/mocr.svg?branch=master https://codecov.io/gh/verifid/mocr/branch/master/graph/badge.svg

Meaningful Optical Character Recognition from identity cards with Deep Learning.

Introduction

mocr is a library that can be used to detect meaningful optical characters from identity cards. Code base is pure Python and works with 3.x versions. It has some low level dependencies such as Tesseract. mocr uses a pre-trained east detector with OpenCV and applies it’s Deep Learning techniques.

It has a pre-trained east detector inside the module and a custom trained model can be given as a parameter.

Prerequisites

  • Tessaract must be installed on your computer before using OCR. Please check installation link for details.

  • The other dependencies are listed on requirements.txt and will be installed when you install with pip.

Installation

From source

Install module using pip:

$ pip install mocr

Download the latest mocr library from: https://github.com/verifid/mocr

Install module using pip:

$ pip install -e .

Extract the source distribution and run:

$ python setup.py build
$ python setup.py install

Running Tests

The test suite can be run against a single Python version which requires pip install pytest and optionally pip install pytest-cov (these are included if you have installed dependencies from requirements.testing.txt)

To run the unit tests with a single Python version:

$ py.test -v

to also run code coverage:

$ py.test -v --cov-report html --cov=mocr

To run the unit tests against a set of Python versions:

$ tox

Sample Usage

  • text_recognition Initiating the TextRecognizer with identity image and then finding the texts with their frames:

import os
from mocr import TextRecognizer

image_path = os.path.join('tests', 'data/sample_uk_identity_card.png')
east_path = os.path.join('mocr', 'model/frozen_east_text_detection.pb')

text_recognizer = TextRecognizer(image_path, east_path)
(image, _, _) = text_recognizer.load_image()
(resized_image, ratio_height, ratio_width, _, _) = text_recognizer.resize_image(image, 320, 320)
(scores, geometry) = text_recognizer.geometry_score(east_path, resized_image)
boxes = text_recognizer.boxes(scores, geometry)
results = text_recognizer.get_results(boxes, image, ratio_height, ratio_width)

# results: Meaningful texts with bounding boxes
  • face_detection:

from mocr import face_detection

image_path = 'YOUR_IDENTITY_IMAGE_PATH'
face_image = face_detection.detect_face(image_path)
# face_image is the byte array detected and cropped image from original image
from mocr import face_detection

video_path = 'YOUR_IDENTITY_VIDEO_PATH'
face_image = face_detection.detect_face_from_video(video_path)
# face_image is the byte array detected and cropped image from original video

CLI

Sample command line usage

  • Optical Character Recognition

python -m mocr --image tests/data/sample_uk_identity_card.png --east tests/model/frozen_east_text_detection.pb
  • Face detection from image file

python -m mocr --image-face 'tests/data/sample_de_identity_card.jpg'
  • Face detection from video file

python -m mocr --video-face 'tests/data/face-demographics-walking.mp4'

Screenshots

Before

image_before

After

image_after

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mocr-0.4.0-py3-none-any.whl (146.4 kB view details)

Uploaded Python 3

File details

Details for the file mocr-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mocr-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 146.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for mocr-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93cd61f801d9089b17e197343fc76812490244d48e4985d591c8f572f6a31c40
MD5 55ee4cebf0974d805c2df6fd14f35039
BLAKE2b-256 2fb6ab8b28447d353361c446b7a89228e0f9b004b56a3161012470e5678c9e63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page