Skip to main content

Extracts the Machine Readable Zone (MRZ) data from document images

Project description

Fast MRZ

License Downloads Python CodeQL PyPI

This repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.

️Features:

  • Detects and extracts the MRZ region from document images
  • Contour detection to accurately identify the MRZ area
  • Custom trained models for ONNX and Tesseract
  • Contains checksum logics for data validation
  • Outputs the extracted MRZ region as text/json for further processing or analysis

Built With

OpenCV Tesseract OCR NumPy ONNX

Installation

  1. Install Tesseract OCR engine. And set PATH variable with the executable.

  2. Install fastmrz

    pip install fastmrz
    

    This can be done through conda too if you prefer.

    conda create -n fastmrz tesseract -c conda-forge
    conda activate fastmrz
    
  3. Copy mrz.traineddata in tessdata folder of repo to the tessdata folder in installed tesseract location

Example

from fastmrz import FastMRZ
import json

fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg")
print("JSON:")
print(json.dumps(passport_mrz, indent=4))

print("\n")

passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg", raw=True)
print("TEXT:")
print(passport_mrz)

OUTPUT:

JSON:
{
    "mrz_type": "TD3",
    "document_type": "P",
    "country_code": "GBR",
    "surname": "PUDARSAN",
    "given_name": "HENERT",
    "document_number": "707797979",
    "nationality": "GBR",
    "date_of_birth": "1995-05-20",
    "sex": "M",
    "date_of_expiry": "2017-04-22",
    "status": "SUCCESS"
}


TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00

MRZ Wiki

MRZ Types & Format

The standard for MRZ code is strictly regulated and has to comply with Doc 9303. Machine Readable Travel Documents published by the International Civil Aviation Organization.

There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:

  • TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
  • TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
  • TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.

Now, based on the example of a national passport, let us take a closer look at the MRZ composition.

MRZ fields distribution

MRZ GIF

ToDo

  • Test for mrva and mrvb documents
  • Add wiki page

License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

Show your support

Give a ⭐️ if this project helped you!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmrz-1.3.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

fastmrz-1.3-py3-none-any.whl (9.8 MB view details)

Uploaded Python 3

File details

Details for the file fastmrz-1.3.tar.gz.

File metadata

  • Download URL: fastmrz-1.3.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for fastmrz-1.3.tar.gz
Algorithm Hash digest
SHA256 0c57c9c497df849664ef8bff3b8c621f65c528c0378d12f60ed5ac8c267aab3e
MD5 5e6982466b1a15d7e64cb1c2bfbd03c0
BLAKE2b-256 26b3b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba

See more details on using hashes here.

File details

Details for the file fastmrz-1.3-py3-none-any.whl.

File metadata

  • Download URL: fastmrz-1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for fastmrz-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 db3de592f7ee5dcc55bb9e68a0a2fee1809c259917c10bc27a392bcd35bcddae
MD5 a62f9ab894c8838d1b8525b721200922
BLAKE2b-256 dc755d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page