Skip to main content

Extracts the Machine Readable Zone (MRZ) data from document images

Project description

Fast MRZ

License Downloads Python CodeQL PyPI

FastMRZ is an open-source Python package that extracts the Machine Readable Zone (MRZ) from passports and other documents. FastMRZ accepts various input formats such as Image, Base64 string, MRZ string, or NumPy array.

FeaturesBuilt WithPrerequisitesInstallationExampleWikiToDoContributing

️✨Features

  • 👁️Detects and extracts the MRZ region from document images
  • ️🔍Contour detection to accurately identify the MRZ area
  • 🎨Custom trained models using ONNX
  • 🆗Contains checksum logics for data validation
  • 📤Outputs the extracted MRZ region as text/json

🛠️Built With

OpenCV Tesseract OCR NumPy ONNX

🚨Prerequisites

  • Install Tesseract OCR engine. And set PATH variable with the executable and ensure that tesseract can be reached from the command line.

⚙️Installation

  1. Install fastmrz

    pip install fastmrz
    

    This can be done through conda too if you prefer.

    conda create -n fastmrz tesseract -c conda-forge
    conda activate fastmrz
    
  2. Copy the mrz.traineddata file from the tessdata folder of the repository into the tessdata folder of the Tesseract installation on YOUR MACHINE

💡Example

from fastmrz import FastMRZ
import json

fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_details("../data/passport_uk.jpg", include_checkdigit=False)
print("JSON:")
print(json.dumps(passport_mrz, indent=4))

print("\n")

passport_mrz = fast_mrz.get_details("../data/passport_uk.jpg", ignore_parse=True)
print("TEXT:")
print(passport_mrz)

OUTPUT:

JSON:
{
    "mrz_type": "TD3",
    "document_code": "P",
    "issuer_code": "GBR",
    "surname": "PUDARSAN",
    "given_name": "HENERT",
    "document_number": "707797979",
    "document_number_checkdigit": "2",
    "nationality_code": "GBR",
    "birth_date": "1995-05-20",
    "sex": "M",
    "expiry_date": "2017-04-22",
    "optional_data": "",
    "mrz_text": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00",
    "status": "SUCCESS"
}


TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00

📃Wiki

MRZ Types & Format

The standard for MRZ code is strictly regulated and has to comply with Doc 9303. Machine Readable Travel Documents published by the International Civil Aviation Organization.

There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:

  • TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
  • TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
  • TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.

Now, based on the example of a national passport, let us take a closer look at the MRZ composition.

MRZ fields distribution

MRZ GIF

✅ToDo

  • Include mrva and mrvb documents
  • Add wiki page
  • Support numpy array as input
  • Support mrz text as input
  • Support base64 as input
  • Support pdf as input
  • Function to return mrz text as output
  • Bulk process
  • Add function parameter - Image Enhancement Model
  • Add function parameter - Text Image Enhancement Model
  • Train Tesseract model with additional data
  • Add function parameter - include_checkdigit
  • Add function - get_mrz_image
  • Add function - validate_mrz
  • Add function - generate_mrz
  • Extract face image
  • Add documentation page
  • Add all checkdigit status

🤝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Commit your changes (git commit -m 'feat: add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

⚖️License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

🙏Show your support

Give a ⭐️ if this project helped you!

🚀Who's Using It?

We’d love to know who’s using fastmrz! If your company or project uses this package, feel free to share your story. You can:

  • Open an issue with the title "We are using fastmrz!" and include your project or company name.

Thank you for supporting fastmrz! 🤟

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmrz-2.1.2.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastmrz-2.1.2-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file fastmrz-2.1.2.tar.gz.

File metadata

  • Download URL: fastmrz-2.1.2.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for fastmrz-2.1.2.tar.gz
Algorithm Hash digest
SHA256 dadba6d66a3f839a648e97de4d894c5dd2c5cdf4252fdec747110ea485f2a1a8
MD5 bbbdfe57b631e91be2f1c68f2b0c28d0
BLAKE2b-256 41cde22dc235dadcb2809950851e087c2cf2b94ca1a76dabee9df31e32c5fe64

See more details on using hashes here.

File details

Details for the file fastmrz-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: fastmrz-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for fastmrz-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5eb2cfaee2451d98a1ccd4ef539ea5bda426bf73bc34d49c1c00498688953ff3
MD5 3b0d89e065d67550d372bf3c52e4e494
BLAKE2b-256 e8419f3b1badd473b0978502f0af21725551e4e30f0bf0af76dee6a70ec00aa6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page