A Python library for reading MRZ data from passport images using Tesseract OCR
Project description
passport_mrz_extractor is a Python library for extracting and validating Machine Readable Zone (MRZ) data from passport images. It uses Tesseract OCR to read MRZ text and validates it using the mrz library.
Features
Extract MRZ data from passport images.
Validate MRZ data fields, including document type, name, nationality, date of birth, and expiry date.
Automatic image processing for better OCR accuracy.
Installation
You can install passport_mrz_extractor using pip:
pip install passport_mrz_extractor
Requirements
Python >= 3.10
Tesseract OCR installed on your system
To install Tesseract:
Ubuntu: sudo apt install tesseract-ocr
MacOS (using Homebrew): brew install tesseract
Windows: Download the installer from https://github.com/UB-Mannheim/tesseract/wiki
Dependencies
This library requires the following Python packages:
pytesseract - For performing OCR on images.
opencv-python - For image processing.
mrz - For MRZ data validation.
Pillow - For handling image files in Python.
Usage
Here’s how to use passport_mrz_extractor to extract MRZ data from a passport image.
### Basic Example
This example demonstrates extracting all available MRZ fields from an image and handling potential errors.
from passport_mrz_extractor import read_mrz
# Path to the passport image
image_path = 'path/to/passport_image.jpg'
try:
mrz_data = read_mrz(image_path)
print("Extracted MRZ Data:")
for key, value in mrz_data.items():
print(f"{key}: {value}")
except ValueError as e:
print(f"Error reading MRZ: {e}")
### Example of Using Specific MRZ Fields
In this example, we extract specific fields such as the country, document number, and birth date, and print them in a formatted output.
from passport_mrz_extractor import read_mrz
# Path to the passport image
image_path = 'path/to/passport_image.jpg'
try:
# Extract MRZ data
mrz_data = mrz_reader.read_mrz(image_path)
# Display specific fields
print("Country of Issue:", mrz_data.get("country"))
print("Document Number:", mrz_data.get("document_number"))
print("Name:", mrz_data.get("name"))
print("Surname:", mrz_data.get("surname"))
print("Date of Birth:", mrz_data.get("birth_date"))
print("Expiry Date:", mrz_data.get("expiry_date"))
print("Nationality:", mrz_data.get("nationality"))
print("Sex:", mrz_data.get("sex"))
except ValueError as e:
print(f"Error reading MRZ: {e}")
Contributing
If you’d like to contribute, please fork the repository and use a feature branch. Pull requests are welcome.
Issues
If you encounter any issues, please report them on the GitHub repository:
License
passport_mrz_extractor is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file passport_mrz_extractor-1.0.2.tar.gz
.
File metadata
- Download URL: passport_mrz_extractor-1.0.2.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 195e5e83c9303f66c9a036201ec500bfb6eca53dfa3a729659f9558e307e4412 |
|
MD5 | 3ee9ecf970694f9f2508fc5bd1946907 |
|
BLAKE2b-256 | 19ff4fbdca0ccf7dc079d9c64542f8dae70c353be7409efd1c0e71f55ab81eb7 |
File details
Details for the file passport_mrz_extractor-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: passport_mrz_extractor-1.0.2-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93d84b208926d785c9359efb11497d27dec2b15057fd312c5e098bd33e9a1522 |
|
MD5 | c2043824377bc0c440d336a9f540e738 |
|
BLAKE2b-256 | be1802a77cffed53b770f02261d0c5f25dcf22c3263bc5ea89267bc10e99c9e9 |