A Python package for extracting information from Panamanian identity document (Cédula) and Passports

These details have not been verified by PyPI

Project links

Project description

Document Analyzer

Document Analyzer is a Python package for extracting structured information from identity documents using PaddleOCR. It supports Panamanian ID cards (Cédulas) in Spanish, and passports with standard ICAO Machine Readable Zones (MRZ) in Spanish or English. The package automatically detects the document type and language, loading the appropriate OCR instance accordingly. It is specifically designed to work with mobile phone photos of documents rather than scans or PDFs, and includes automatic image preprocessing to improve extraction accuracy from lower-quality images.

Version

Current version: 0.1.0 — Initial release.

Features

Cédula Extraction — Extract ID number, date of birth, place of birth, expiry date, and handwritten signature detection from Panamanian identity cards
Passport Extraction — Extract ID number, date of birth, place of birth, nationality, and expiry date from passports with standard ICAO Machine Readable Zones (MRZ). Works with any country's passport that follows the ICAO standard format.
Automatic Document Detection — Intelligently detect whether an image contains a Cédula or Passport
Image Preprocessing — Automatically enhance poor quality images before OCR processing
CLI Support — Full command-line interface for document analysis without writing code
JSON Output — Structured JSON results for easy integration into other systems
Multi-Language Support — Cédulas are processed in Spanish only. Passports support automatic language detection between Spanish and English, with the appropriate PaddleOCR instance loaded based on detected language.

Requirements

Python 3.8 to 3.12
PaddleOCR 3.2.0

Installation

pip install document-analyzer

CLI Usage

The package includes a command-line interface accessible via the document-analyzer command.

Basic Usage with Auto-Detection

Analyze a document with automatic type detection:

document-analyzer analyze photo.jpg

The output is printed as JSON to stdout.

Specify Document Type

If you know the document type, you can skip auto-detection for faster processing:

document-analyzer analyze cedula.jpg --type cedula
document-analyzer analyze passport.jpg --type passport

Save Output to File

Save analysis results to a JSON file instead of printing to stdout:

document-analyzer analyze photo.jpg --save result.json

Verbose Mode

Enable debug-level logging to see detailed processing information:

document-analyzer analyze photo.jpg -v

Combine with --save for logging while saving results:

document-analyzer analyze photo.jpg --save result.json -v

Help

View all available options:

document-analyzer analyze --help

Library Usage

You can use Document Analyzer as a Python library in your own code. Here are examples for the main use cases.

Auto-Detection with DocumentAnalyzer

from document_analyzer import DocumentAnalyzer

# Initialize with image path
analyzer = DocumentAnalyzer("photo.jpg")

# Detect document type
doc_type = analyzer.detect_document_type()
print(f"Detected: {doc_type}")  # "cedula" or "passport" or "unknown"

Extract from Cédula

from document_analyzer import CedulaAnalyzer

# Initialize with image path
analyzer = CedulaAnalyzer("cedula.jpg")

# Analyze the document
results = analyzer.analyze_cedula()
print(results)

# Optional: provide user email for logging context
analyzer = CedulaAnalyzer("cedula.jpg", user_email="user@example.com")

Extract from Passport

from document_analyzer import PassportAnalyzer

# Initialize with image path
analyzer = PassportAnalyzer("passport.jpg")

# Analyze the document
results = analyzer.analyze_passport()
print(results)

# Optional: provide user email for logging context
analyzer = PassportAnalyzer("passport.jpg", user_email="user@example.com")

Convenience Functions

You can also use high-level functions for simpler code:

from document_analyzer import analyze_document, analyze_cedula, analyze_passport

# Auto-detect and analyze
result = analyze_document("photo.jpg")

# Analyze specific document type
cedula_result = analyze_cedula("cedula.jpg")
passport_result = analyze_passport("passport.jpg")

Output

Analysis results are returned as dictionaries containing structured information about the extracted data. Below are example outputs for both document types with realistic but fictional Panamanian data.

Cédula Output Example

{
    "success": "both",
    "cedula_info": {
        "type": "cedula",
        "id_number": "8-123-456",
        "dob": "15-May-1990",
        "pob": "Panama",
        "nationality": "Panamanian",
        "expiry": "22-Mar-2030"
    },
    "signature": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}

The success field can be "both" (all info + signature), "cedula_info" (all info but no signature), "signature" (signature only), or "none" (extraction failed).

Passport Output Example

{
    "success": "passport_info",
    "passport_info": {
        "type": "passport",
        "id_number": "PA123456789",
        "dob": "20-Nov-1988",
        "pob": "Colón",
        "nationality": "PAN",
        "expiry": "10-Sep-2032"
    },
    "signature": null
}

The success field can be "passport_info" (extraction successful) or "none" (extraction failed).

Image Requirements

Document Analyzer is designed to work with mobile phone photos of documents. Here are the technical requirements:

Supported Formats — JPEG, PNG, BMP, TIFF, GIF
Orientation — Portrait orientation works best
Quality — Mobile phone camera quality is acceptable; the package includes automatic preprocessing to handle lower quality images
Coverage — Entire document should be visible in the frame
Lighting — Avoid strong shadows or glare across the document

The package includes automatic image preprocessing that attempts to enhance poor quality images before OCR processing. This can help improve accuracy for images with:

Low contrast
Poor lighting conditions
Motion blur
Dust or slight damage

Note on PDFs: PDF files are not listed in supported formats because they have not been tested. PDFs are not officially supported and may not work as expected. Use image files (JPG, PNG, etc.) for best results.

GPU Acceleration

PaddleOCR supports GPU acceleration via CUDA for significantly faster processing on NVIDIA GPUs. However, Document Analyzer has only been tested and validated on CPU hardware (Intel i5, 10th generation).

If you want to experiment with GPU acceleration, you will need to:

Configure PaddleOCR to use your CUDA-enabled GPU according to the PaddleOCR documentation
Ensure your system has CUDA and cuDNN properly configured
Test thoroughly in your environment before deploying to production

CPU processing is stable and recommended for production use.

Logging

Document Analyzer uses Python's standard logging module with the logger namespace document_analyzer. This allows you to configure logging behavior in your own applications.

Basic Configuration

import logging

# Enable debug logging from document_analyzer
logging.basicConfig(level=logging.DEBUG)

Django Configuration

If you're using Django and want to capture logs from Document Analyzer, add this to your settings.py:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
        'file': {
            'class': 'logging.FileHandler',
            'filename': 'document_analyzer.log',
        },
    },
    'loggers': {
        'document_analyzer': {
            'handlers': ['console', 'file'],
            'level': 'DEBUG',
        },
    },
}

Flask Configuration

For Flask applications:

import logging
from logging.handlers import RotatingFileHandler

if not app.debug:
    handler = RotatingFileHandler('document_analyzer.log', maxBytes=10000000, backupCount=10)
    handler.setLevel(logging.DEBUG)
    app.logger.addHandler(handler)
    
    # Get the document_analyzer logger
    doc_logger = logging.getLogger('document_analyzer')
    doc_logger.addHandler(handler)
    doc_logger.setLevel(logging.DEBUG)

Limitations

Be aware of the following limitations when using Document Analyzer:

Cédula Support — Cédula extraction is specifically designed for Panamanian identity cards in Spanish only. Non-Panamanian identity documents are not supported. Passport extraction works with any standard ICAO MRZ passport regardless of country.
Cédula Language — Panamanian Cédulas are processed in Spanish only. English or other languages are not supported for Cédulas.
Image Quality Dependency — Extraction accuracy depends on image quality. Very poor lighting, severe blur, or damaged documents may produce incomplete or inaccurate results. While the package includes preprocessing to improve poor quality images, there are limits to what can be recovered.
PDF Support Not Tested — PDFs are not officially supported and have not been tested. The package is designed for and tested with image files (JPG, PNG, etc.).
Passport MRZ Dependency — Passport extraction relies primarily on the Machine Readable Zone (MRZ) at the bottom of the document page. If the MRZ is obscured, cut off, or damaged in the photo, extraction accuracy will be significantly affected. Ensure the entire document including the bottom strip is clearly visible in the frame.
Place of Birth for Non-Panamanian Passports — Place of birth is the only passport field extracted from the document's written fields rather than the MRZ. This works reliably for Panamanian passports. For other countries it may be inaccurate or missing depending on how that country formats and labels the biographical page of their passport.
CPU Testing Only — The package has only been tested on CPU hardware (Intel i5, 10th generation). GPU acceleration via CUDA may work but is not officially supported or validated.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Author

Name: Usman Ghani
GitHub: usman-369

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Jun 16, 2026

0.1.1 yanked

Jun 15, 2026

This version

0.1.0 yanked

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_analyzer-0.1.0.tar.gz (43.7 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

document_analyzer-0.1.0-py3-none-any.whl (48.5 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file document_analyzer-0.1.0.tar.gz.

File metadata

Download URL: document_analyzer-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 43.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for document_analyzer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2b818e10edaf2cfbd02518d747900984f444d05c8de7c4c66f9771922860d8b4`
MD5	`4fa14ee94de75e0637befbd556b3a7e5`
BLAKE2b-256	`10780277c543dab2366f5e1fa46260f5ba10baf30994834edfb8c8bbbdeeeb37`

See more details on using hashes here.

File details

Details for the file document_analyzer-0.1.0-py3-none-any.whl.

File metadata

Download URL: document_analyzer-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 48.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for document_analyzer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87a17f1a1f04b2d3a74c361a0567e758efc0fecc3d36df5c38539788a8e12553`
MD5	`fd827f26ba378a10f23cb4e3a10cf200`
BLAKE2b-256	`6bbc33cc0c96471d33efba939c585138c7ec2d437c526d219a987116976a4c19`

See more details on using hashes here.

document-analyzer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Document Analyzer

Version

Features

Requirements

Installation

CLI Usage

Basic Usage with Auto-Detection

Specify Document Type

Save Output to File

Verbose Mode

Help

Library Usage

Auto-Detection with DocumentAnalyzer

Extract from Cédula

Extract from Passport

Convenience Functions

Output

Cédula Output Example

Passport Output Example

Image Requirements

GPU Acceleration

Logging

Basic Configuration

Django Configuration

Flask Configuration

Limitations

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes