Skip to main content

MCP server for passport MRZ extraction and document parsing

Project description

Doc Parser MCP Server

A Model Context Protocol (MCP) server for passport MRZ (Machine Readable Zone) extraction and document parsing.

Features

  • MRZ Extraction: Extract MRZ data from passport images
  • MRZ Detection: Locate MRZ regions in passport images
  • Text Parsing: Parse MRZ text into structured information
  • Checksum Validation: Validate MRZ checksums
  • Multiple Input Formats: Support file paths and base64 encoded images
  • Fallback Processing: Works with or without external passport_mrz_extractor library

Installation

From PyPI (when published)

pip install doc-parser-mcp

From Source

git clone https://github.com/yourusername/doc-parser-mcp.git
cd doc-parser-mcp
pip install -e .

Dependencies

The server requires the following dependencies:

  • mcp>=1.0.0 - Model Context Protocol framework
  • opencv-python>=4.5.0 - Image processing
  • numpy>=1.21.0 - Numerical operations
  • Pillow>=8.0.0 - Image manipulation
  • pytesseract>=0.3.8 - OCR functionality (optional, for fallback)

Optional Dependencies

  • passport-mrz-extractor>=1.0.0 - Specialized passport MRZ extraction library

System Dependencies

For OCR functionality (fallback mode), you need Tesseract:

macOS:

brew install tesseract

Ubuntu/Debian:

sudo apt-get install tesseract-ocr

Windows: Download and install from Tesseract at UB Mannheim

Usage

As MCP Server

Run the server using stdio transport:

doc-parser-mcp

Available Tools

The server provides the following tools:

1. extract_passport_mrz

Extract MRZ data from a passport image.

Parameters:

  • image_path (string): Path to the passport image file
  • image_data (string): Base64 encoded image data (alternative to image_path)

Example:

{
  "image_path": "/path/to/passport.jpg"
}

2. detect_mrz_region

Detect and locate MRZ region in passport image.

Parameters:

  • image_path (string): Path to the passport image file
  • image_data (string): Base64 encoded image data (alternative to image_path)

3. parse_mrz_text

Parse MRZ text and extract structured information.

Parameters:

  • mrz_text (string): Raw MRZ text (2-3 lines)

Example:

{
  "mrz_text": "P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<\nL898902C36UTO7408122F1204159ZE184226B<<<<<10"
}

4. validate_mrz_checksum

Validate MRZ checksums.

Parameters:

  • mrz_line (string): MRZ line to validate

Example Output

MRZ Extraction Result

{
  "format": "TD3",
  "document_type": "P",
  "country_code": "UTO",
  "surname": "ERIKSSON",
  "given_names": "ANNA MARIA",
  "passport_number": "L898902C3",
  "nationality": "UTO",
  "birth_date": {
    "year": 1974,
    "month": 8,
    "day": 12,
    "formatted": "1974-08-12"
  },
  "sex": "F",
  "expiration_date": {
    "year": 2012,
    "month": 4,
    "day": 15,
    "formatted": "2012-04-15"
  },
  "personal_number": "ZE184226B",
  "check_digits": {
    "passport": "6",
    "birth": "2",
    "expiration": "9",
    "personal": "1"
  }
}

Supported Formats

MRZ Formats

  • TD3: 3-line format (44 characters per line) - Standard passports
  • TD1: 2-line format (30 characters per line) - ID cards

Image Formats

  • JPEG
  • PNG
  • BMP
  • TIFF

Development

Setup Development Environment

git clone https://github.com/yourusername/doc-parser-mcp.git
cd doc-parser-mcp

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

Running Tests

pytest

Code Formatting

black doc_parser_mcp/

Type Checking

mypy doc_parser_mcp/

Configuration

The server can be configured through environment variables:

  • TESSERACT_CMD: Path to Tesseract executable (if not in PATH)
  • MRZ_DEBUG: Enable debug logging (set to "1")

Limitations

  • Image quality affects extraction accuracy
  • Works best with clear, high-resolution passport images
  • MRZ should be visible and not obstructed
  • Some passports may have non-standard formats

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For issues and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_parser_mcp-0.1.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc_parser_mcp-0.1.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file doc_parser_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: doc_parser_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doc_parser_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f4581c8ef841ffe63081d27849a479e488aed5d6bac1cc579fc955b91cc1019
MD5 8c92415e0268430ed08587c1d3bad583
BLAKE2b-256 91d3fd9a658f85d9da8b75cd2139eecfdab9628ee8bc3ce094b7e8b3c65a71b6

See more details on using hashes here.

File details

Details for the file doc_parser_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doc_parser_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doc_parser_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1887d4e7462e4297010772bf3a9bf9dbabe1752cf80c9c22ce9b85f597a236f0
MD5 1d06712ab3a715d57a6984f5dd6f3cbf
BLAKE2b-256 08b5adfb22fdac29f370e2eaf9bfa5f937c630a4b7c3bb1129cda828dfb95cd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page