Minimal-footprint reimplementation of PassportEye's MRZ detection/reading (numpy + pillow only)
Project description
mrzmini
A minimal-footprint, drop-in reimplementation of PassportEye's MRZ (machine-readable zone) detection and reading pipeline.
PassportEye does one small, precise job — find and read the MRZ on an ID document —
but drags in a heavy dependency stack to do it: numpy, scipy, scikit-image,
scikit-learn, matplotlib, imageio, pdfminer, pytesseract.
mrzmini reproduces the exact same algorithm and output using only:
- numpy
- Pillow
- the external
tesseractbinary (called viasubprocess; required only for OCR)
Everything skimage / sklearn / scipy / imageio / pdfminer / pytesseract
did is reimplemented on numpy in mrzmini/imageproc.py,
mrzmini/geometry.py, mrzmini/ocr.py
and mrzmini/pdf.py. The MRZ text parser
(text.py) and the pipeline engine (pipeline.py)
are pure-Python and copied verbatim from upstream (MIT).
Install
pip install mrzmini
mrzmini needs the external tesseract OCR binary on PATH for the OCR step
(detection alone does not need it):
apt install tesseract-ocr # Debian/Ubuntu
brew install tesseract # macOS
Override the binary location with the TESSERACT_CMD environment variable.
Usage
from mrzmini import read_mrz
mrz = read_mrz('passport.jpg') # path, bytes, file-like, or a .pdf
print(mrz) # None if nothing was found
print(mrz.to_dict())
The public surface mirrors PassportEye: read_mrz(file, save_roi=False, extra_cmdline_params='') returns an MRZ object with the same fields
(mrz_type, valid, valid_score, number, names, surname,
date_of_birth, expiration_date, nationality, sex, check digits, …).
Command-line demo:
mrzmini passport.jpg # installed console script
python -m mrzmini passport.jpg # or via the module
Requirements
- Python ≥ 3.12,
numpy,pillow(installed viauv sync). - The
tesseractOCR binary onPATH(e.g.apt install tesseract-ocr). Override its location with theTESSERACT_CMDenvironment variable.
How it works
The pipeline is identical to PassportEye's (a lazy DAG of components):
| Step | What it does | Upstream dependency replaced |
|---|---|---|
Loader |
read image → grayscale (color → float64 luma, gray → uint8) | skimage.io / imageio; pdfminer for PDFs |
Scaler |
downscale so width ≤ 250 (anti-aliased) | skimage.transform.rescale (→ scipy) |
BooneTransform |
threshold_otsu(closing(|sobel_v(black_tophat(img))|)) → binary |
skimage.morphology + skimage.filters |
MRZBoxLocator |
marching-squares contours → RotatedBox (PCA) → merge parallel boxes |
skimage.measure + sklearn.PCA |
extract_from_image |
un-rotate + crop the ROI | skimage.transform.rotate |
ocr |
run Tesseract on the ROI | pytesseract |
MRZ.from_ocr |
clean up + parse + checksum-validate | (pure Python) |
Parity with PassportEye
parity_check.py compares mrzmini against the real
PassportEye stage by stage. Across the entire PassportEye test corpus
(36 images: TD1/TD2/TD3/MRVA/MRVB, scores 0–100, JPG/PNG/PDF):
img_binary: 0 pixel disagreements on every image- detected MRZ boxes: identical on every image
read_mrz(...).to_dict(): identical on every image
uv run --group parity python parity_check.py # scans testdata/
uv run --group parity python parity_check.py some.jpg # specific files
Internally every reimplemented primitive matches its scikit-image / scipy / scikit-learn counterpart to floating-point precision (bilinear resize and marching-squares contours are bit-exact; the bicubic OCR-retry resize matches scipy to ~1e-14, and the PCA box geometry to ~1e-12).
Notes / limitations
- PDF support is best-effort: like upstream it extracts the first embedded JFIF
JPEG (
\xff\xd8\xff\xe0); other embedded image encodings are not handled. - Tesseract is an external binary, not a Python library, so it remains a requirement for the OCR step (detection alone does not need it).
License
The reused PassportEye algorithm and copied modules are MIT (© Konstantin Tretyakov). Reimplemented primitives follow scikit-image / scipy behavior (BSD).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mrzmini-0.1.0.tar.gz.
File metadata
- Download URL: mrzmini-0.1.0.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a313dec207bdd9500146c791692c12cf6668799add88cf4b636276b9383c0e2b
|
|
| MD5 |
b2a741a18916783ec05b96d2012417ea
|
|
| BLAKE2b-256 |
aba9925082eafe11b7ceee2072e7a8c3a0298a022de6ac02bff19e47d269983e
|
Provenance
The following attestation bundles were made for mrzmini-0.1.0.tar.gz:
Publisher:
release.yml on rbaks/mrzmini
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mrzmini-0.1.0.tar.gz -
Subject digest:
a313dec207bdd9500146c791692c12cf6668799add88cf4b636276b9383c0e2b - Sigstore transparency entry: 1645561594
- Sigstore integration time:
-
Permalink:
rbaks/mrzmini@b2f8fb78d075cee85312d307a8397d80b190abfd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rbaks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b2f8fb78d075cee85312d307a8397d80b190abfd -
Trigger Event:
push
-
Statement type:
File details
Details for the file mrzmini-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mrzmini-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c7b9ea97cc2d2d44681d599ef26fbb5bb4ff7931478311bf69ece5247fe3a44
|
|
| MD5 |
89ae14a15c98489b162a1fa034b00758
|
|
| BLAKE2b-256 |
1e0fce786c3227614190a6ee5ad2db1967185e75f48d27fc9cac4b4fe6c10e8c
|
Provenance
The following attestation bundles were made for mrzmini-0.1.0-py3-none-any.whl:
Publisher:
release.yml on rbaks/mrzmini
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mrzmini-0.1.0-py3-none-any.whl -
Subject digest:
9c7b9ea97cc2d2d44681d599ef26fbb5bb4ff7931478311bf69ece5247fe3a44 - Sigstore transparency entry: 1645561661
- Sigstore integration time:
-
Permalink:
rbaks/mrzmini@b2f8fb78d075cee85312d307a8397d80b190abfd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rbaks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b2f8fb78d075cee85312d307a8397d80b190abfd -
Trigger Event:
push
-
Statement type: