MCP server for passport MRZ extraction and document parsing
Project description
Doc Parser MCP Server
A Model Context Protocol (MCP) server for passport MRZ (Machine Readable Zone) extraction and document parsing.
Features
- MRZ Extraction: Extract MRZ data from passport images
- MRZ Detection: Locate MRZ regions in passport images
- Text Parsing: Parse MRZ text into structured information
- Checksum Validation: Validate MRZ checksums
- Multiple Input Formats: Support file paths and base64 encoded images
- Fallback Processing: Works with or without external passport_mrz_extractor library
Installation
From PyPI (when published)
pip install doc-parser-mcp
From Source
git clone https://github.com/yourusername/doc-parser-mcp.git
cd doc-parser-mcp
pip install -e .
Dependencies
The server requires the following dependencies:
mcp>=1.0.0- Model Context Protocol frameworkopencv-python>=4.5.0- Image processingnumpy>=1.21.0- Numerical operationsPillow>=8.0.0- Image manipulationpytesseract>=0.3.8- OCR functionality (optional, for fallback)
Optional Dependencies
passport-mrz-extractor>=1.0.0- Specialized passport MRZ extraction library
System Dependencies
For OCR functionality (fallback mode), you need Tesseract:
macOS:
brew install tesseract
Ubuntu/Debian:
sudo apt-get install tesseract-ocr
Windows: Download and install from Tesseract at UB Mannheim
Usage
As MCP Server
Run the server using stdio transport:
doc-parser-mcp
Available Tools
The server provides the following tools:
1. extract_passport_mrz
Extract MRZ data from a passport image.
Parameters:
image_path(string): Path to the passport image fileimage_data(string): Base64 encoded image data (alternative to image_path)
Example:
{
"image_path": "/path/to/passport.jpg"
}
2. detect_mrz_region
Detect and locate MRZ region in passport image.
Parameters:
image_path(string): Path to the passport image fileimage_data(string): Base64 encoded image data (alternative to image_path)
3. parse_mrz_text
Parse MRZ text and extract structured information.
Parameters:
mrz_text(string): Raw MRZ text (2-3 lines)
Example:
{
"mrz_text": "P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<\nL898902C36UTO7408122F1204159ZE184226B<<<<<10"
}
4. validate_mrz_checksum
Validate MRZ checksums.
Parameters:
mrz_line(string): MRZ line to validate
Example Output
MRZ Extraction Result
{
"format": "TD3",
"document_type": "P",
"country_code": "UTO",
"surname": "ERIKSSON",
"given_names": "ANNA MARIA",
"passport_number": "L898902C3",
"nationality": "UTO",
"birth_date": {
"year": 1974,
"month": 8,
"day": 12,
"formatted": "1974-08-12"
},
"sex": "F",
"expiration_date": {
"year": 2012,
"month": 4,
"day": 15,
"formatted": "2012-04-15"
},
"personal_number": "ZE184226B",
"check_digits": {
"passport": "6",
"birth": "2",
"expiration": "9",
"personal": "1"
}
}
Supported Formats
MRZ Formats
- TD3: 3-line format (44 characters per line) - Standard passports
- TD1: 2-line format (30 characters per line) - ID cards
Image Formats
- JPEG
- PNG
- BMP
- TIFF
Development
Setup Development Environment
git clone https://github.com/yourusername/doc-parser-mcp.git
cd doc-parser-mcp
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e ".[dev]"
Running Tests
pytest
Code Formatting
black doc_parser_mcp/
Type Checking
mypy doc_parser_mcp/
Configuration
The server can be configured through environment variables:
TESSERACT_CMD: Path to Tesseract executable (if not in PATH)MRZ_DEBUG: Enable debug logging (set to "1")
Limitations
- Image quality affects extraction accuracy
- Works best with clear, high-resolution passport images
- MRZ should be visible and not obstructed
- Some passports may have non-standard formats
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- passport_mrz_extractor - Primary MRZ extraction library
- OpenCV - Computer vision functionality
- Tesseract - OCR engine
- Model Context Protocol - Protocol framework
Support
For issues and questions:
- Create an issue on GitHub
- Check the documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doc_parser_mcp-0.1.0.tar.gz.
File metadata
- Download URL: doc_parser_mcp-0.1.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f4581c8ef841ffe63081d27849a479e488aed5d6bac1cc579fc955b91cc1019
|
|
| MD5 |
8c92415e0268430ed08587c1d3bad583
|
|
| BLAKE2b-256 |
91d3fd9a658f85d9da8b75cd2139eecfdab9628ee8bc3ce094b7e8b3c65a71b6
|
File details
Details for the file doc_parser_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: doc_parser_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1887d4e7462e4297010772bf3a9bf9dbabe1752cf80c9c22ce9b85f597a236f0
|
|
| MD5 |
1d06712ab3a715d57a6984f5dd6f3cbf
|
|
| BLAKE2b-256 |
08b5adfb22fdac29f370e2eaf9bfa5f937c630a4b7c3bb1129cda828dfb95cd1
|