Skip to main content

Fast and robust MRZ extraction, parsing, and validation using PaddleOCR

Project description

OmniMRZ

License Downloads Python CodeQL PyPI

OmniMRZ is a production-grade MRZ extraction and validation engine designed for KYC, identity verification, and document intelligence pipelines.

Unlike simple MRZ readers, OmniMRZ evaluates whether an MRZ is structurally correct, cryptographically valid, and logically plausible.

⭐ Show Your Support If OmniMRZ helped you or saved development time: 👉 Please consider starring the repository It helps visibility and motivates continued development

Features

Installation

Contributing

Features

At a glance

  • MRZ detection and extraction from images
  • Supports TD3 (passport) format
  • Checksum validation (ICAO 9303)
  • Logical and structural validation
  • Clean Python API

Detailed features

🔍 MRZ Extraction

  • PaddleOCR-based MRZ text extraction (robust on mobile & noisy images)
  • Intelligent MRZ line clustering & reconstruction
  • Automatic MRZ type detection (TD1 / TD2 / TD3)
  • OCR noise filtering & MRZ-safe character normalization
  • Works even with partially corrupted or misaligned MRZs

🧱 Structural Validation (ICAO 9303)

  • Exact line-length enforcement
  • Strict MRZ format verification
  • Field-level structural checks
  • Early-exit gating for invalid layouts

🔢 Checksum Validation

  • Fully ICAO-9303 compliant checksum algorithm
  • Field-level validation:
  • Document number
  • Date of birth
  • Expiry date
  • Composite checksum
  • OCR-error tolerant digit correction (O→0, S→5, B→8, etc.)
  • Detailed checksum failure diagnostics

🧠 Logical & Semantic Validation

  • Expired document detection
  • Future date-of-birth detection
  • Implausible age detection
  • DOB ≥ expiry detection
  • Gender value validation (M, F, X, <)
  • Cross-field consistency signals (issuer vs nationality)

📤 Output

  • Clean MRZ text
  • Structured JSON
  • Deterministic pass / fail / warning signals
  • Human-readable error messages

Installation

pip install omnimrz

Note: PaddleOCR requires additional system dependencies. Please ensure PaddlePaddle installs correctly on your platform.

pip install paddleocr
pip install paddle paddle

or if that fails then run

python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Quick Usage

from omnimrz import OmniMRZ

omni = OmniMRZ()
result = omni.process("ukpassport.jpg")

print(result)

Output Example

{
  "extraction": {
    "status": "SUCCESS(extraction of mrz)",
    "line1": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<",
    "line2": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
  },
  "structural_validation": {
    "status": "PASS",
    "mrz_type": "TD3",
    "errors": []
  },
  "checksum_validation": {
    "status": "PASS",
    "errors": []
  },
  "parsed_data": {
    "status": "PARSED",
    "data": {
      "document_type": "P",
      "issuing_country": "GBR",
      "surname": "PUDARSAN",
      "given_names": "HENERT",
      "document_number": "707797979",
      "nationality": "GBR",
      "date_of_birth": "1995-05-20",
      "gender": "M",
      "expiry_date": "2017-04-22",
      "personal_number": ""
    }
  },
  "logical_validation": {
    "status": "FAIL",
    "errors": [
      "DOCUMENT_EXPIRED"
    ]
  }
}

Contributing

Contributions are welcome!🤝

  1. Fork the repository
  2. Create your feature branch
git checkout -b feature/amazing-feature
  1. Commit your changes
  2. Push to your branch
  3. Open a Pull Request

misc

Visitor Count

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnimrz-0.1.1.tar.gz (30.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnimrz-0.1.1-py3-none-any.whl (38.2 MB view details)

Uploaded Python 3

File details

Details for the file omnimrz-0.1.1.tar.gz.

File metadata

  • Download URL: omnimrz-0.1.1.tar.gz
  • Upload date:
  • Size: 30.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for omnimrz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ec724b6ea62a5fce4ca12feccad5b939c256dd199f8e32ed5b169af8d73d6986
MD5 92d67026db407af291eec24fd65d7dc6
BLAKE2b-256 9f743d3d3a008878ccc4c0d2fc96b4188bea339111f942ec9399e021e9140a9c

See more details on using hashes here.

File details

Details for the file omnimrz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: omnimrz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 38.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for omnimrz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86c53619e231eb1617302ee2d9000d33a517b8cf9c2455c582d41d0ebf1b7cf8
MD5 e135dc0032730f10324f96bb7568897e
BLAKE2b-256 a67dba251e97d00aa522fd956cef527710c06b7b8c2318e51ae84f55495767e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page