Extracts the Machine Readable Zone (MRZ) data from document images
Project description
Fast MRZ
This repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.
️Features:
- Detects and extracts the MRZ region from document images
- Contour detection to accurately identify the MRZ area
- Custom trained models for ONNX and Tesseract
- Contains checksum logics for data validation
- Outputs the extracted MRZ region as text/json for further processing or analysis
Built With
Installation
-
Install Tesseract OCR engine. And set
PATH
variable with the executable. -
Install
fastmrz
pip install fastmrz
This can be done through conda too if you prefer.
conda create -n fastmrz tesseract -c conda-forge conda activate fastmrz
-
Copy
mrz.traineddata
intessdata
folder of repo to the tessdata folder in installed tesseract location
Example
from fastmrz import FastMRZ
import json
fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg")
print("JSON:")
print(json.dumps(passport_mrz, indent=4))
print("\n")
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg", raw=True)
print("TEXT:")
print(passport_mrz)
OUTPUT:
JSON:
{
"mrz_type": "TD3",
"document_type": "P",
"country_code": "GBR",
"surname": "PUDARSAN",
"given_name": "HENERT",
"document_number": "707797979",
"nationality": "GBR",
"date_of_birth": "1995-05-20",
"sex": "M",
"date_of_expiry": "2017-04-22",
"status": "SUCCESS"
}
TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
MRZ Wiki
MRZ Types & Format
The standard for MRZ code is strictly regulated and has to comply with Doc 9303. Machine Readable Travel Documents published by the International Civil Aviation Organization.
There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:
- TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.
Now, based on the example of a national passport, let us take a closer look at the MRZ composition.
ToDo
- Test for mrva and mrvb documents
- Add
wiki
page
License
Distributed under the AGPL-3.0 License. See LICENSE
for more information.
Show your support
Give a ⭐️ if this project helped you!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastmrz-1.3.tar.gz
.
File metadata
- Download URL: fastmrz-1.3.tar.gz
- Upload date:
- Size: 4.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c57c9c497df849664ef8bff3b8c621f65c528c0378d12f60ed5ac8c267aab3e |
|
MD5 | 5e6982466b1a15d7e64cb1c2bfbd03c0 |
|
BLAKE2b-256 | 26b3b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba |
File details
Details for the file fastmrz-1.3-py3-none-any.whl
.
File metadata
- Download URL: fastmrz-1.3-py3-none-any.whl
- Upload date:
- Size: 9.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db3de592f7ee5dcc55bb9e68a0a2fee1809c259917c10bc27a392bcd35bcddae |
|
MD5 | a62f9ab894c8838d1b8525b721200922 |
|
BLAKE2b-256 | dc755d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c |