Skip to main content

Minimal-footprint reimplementation of PassportEye's MRZ detection/reading (numpy + pillow only)

Project description

mrzmini

A minimal-footprint, drop-in reimplementation of PassportEye's MRZ (machine-readable zone) detection and reading pipeline.

PassportEye does one small, precise job — find and read the MRZ on an ID document — but drags in a heavy dependency stack to do it: numpy, scipy, scikit-image, scikit-learn, matplotlib, imageio, pdfminer, pytesseract.

mrzmini reproduces the exact same algorithm and output using only:

  • numpy
  • Pillow
  • the external tesseract binary (called via subprocess; required only for OCR)

Everything skimage / sklearn / scipy / imageio / pdfminer / pytesseract did is reimplemented on numpy in mrzmini/imageproc.py, mrzmini/geometry.py, mrzmini/ocr.py and mrzmini/pdf.py. The MRZ text parser (text.py) and the pipeline engine (pipeline.py) are pure-Python and copied verbatim from upstream (MIT).

Install

pip install mrzmini

mrzmini needs the external tesseract OCR binary on PATH for the OCR step (detection alone does not need it):

apt install tesseract-ocr        # Debian/Ubuntu
brew install tesseract           # macOS

Override the binary location with the TESSERACT_CMD environment variable.

Usage

from mrzmini import read_mrz

mrz = read_mrz('passport.jpg')        # path, bytes, file-like, or a .pdf
print(mrz)                            # None if nothing was found
print(mrz.to_dict())

The public surface mirrors PassportEye: read_mrz(file, save_roi=False, extra_cmdline_params='') returns an MRZ object with the same fields (mrz_type, valid, valid_score, number, names, surname, date_of_birth, expiration_date, nationality, sex, check digits, …).

Command-line demo:

mrzmini passport.jpg                  # installed console script
python -m mrzmini passport.jpg        # or via the module

Requirements

  • Python ≥ 3.12, numpy, pillow (installed via uv sync).
  • The tesseract OCR binary on PATH (e.g. apt install tesseract-ocr). Override its location with the TESSERACT_CMD environment variable.

How it works

The pipeline is identical to PassportEye's (a lazy DAG of components):

Step What it does Upstream dependency replaced
Loader read image → grayscale (color → float64 luma, gray → uint8) skimage.io / imageio; pdfminer for PDFs
Scaler downscale so width ≤ 250 (anti-aliased) skimage.transform.rescale (→ scipy)
BooneTransform threshold_otsu(closing(|sobel_v(black_tophat(img))|)) → binary skimage.morphology + skimage.filters
MRZBoxLocator marching-squares contours → RotatedBox (PCA) → merge parallel boxes skimage.measure + sklearn.PCA
extract_from_image un-rotate + crop the ROI skimage.transform.rotate
ocr run Tesseract on the ROI pytesseract
MRZ.from_ocr clean up + parse + checksum-validate (pure Python)

Parity with PassportEye

parity_check.py compares mrzmini against the real PassportEye stage by stage. Across the entire PassportEye test corpus (36 images: TD1/TD2/TD3/MRVA/MRVB, scores 0–100, JPG/PNG/PDF):

  • img_binary: 0 pixel disagreements on every image
  • detected MRZ boxes: identical on every image
  • read_mrz(...).to_dict(): identical on every image
uv run --group parity python parity_check.py            # scans testdata/
uv run --group parity python parity_check.py some.jpg   # specific files

Internally every reimplemented primitive matches its scikit-image / scipy / scikit-learn counterpart to floating-point precision (bilinear resize and marching-squares contours are bit-exact; the bicubic OCR-retry resize matches scipy to ~1e-14, and the PCA box geometry to ~1e-12).

Notes / limitations

  • PDF support is best-effort: like upstream it extracts the first embedded JFIF JPEG (\xff\xd8\xff\xe0); other embedded image encodings are not handled.
  • Tesseract is an external binary, not a Python library, so it remains a requirement for the OCR step (detection alone does not need it).

License

The reused PassportEye algorithm and copied modules are MIT (© Konstantin Tretyakov). Reimplemented primitives follow scikit-image / scipy behavior (BSD).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrzmini-0.1.0.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mrzmini-0.1.0-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file mrzmini-0.1.0.tar.gz.

File metadata

  • Download URL: mrzmini-0.1.0.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mrzmini-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a313dec207bdd9500146c791692c12cf6668799add88cf4b636276b9383c0e2b
MD5 b2a741a18916783ec05b96d2012417ea
BLAKE2b-256 aba9925082eafe11b7ceee2072e7a8c3a0298a022de6ac02bff19e47d269983e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mrzmini-0.1.0.tar.gz:

Publisher: release.yml on rbaks/mrzmini

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mrzmini-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mrzmini-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mrzmini-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c7b9ea97cc2d2d44681d599ef26fbb5bb4ff7931478311bf69ece5247fe3a44
MD5 89ae14a15c98489b162a1fa034b00758
BLAKE2b-256 1e0fce786c3227614190a6ee5ad2db1967185e75f48d27fc9cac4b4fe6c10e8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mrzmini-0.1.0-py3-none-any.whl:

Publisher: release.yml on rbaks/mrzmini

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page