Skip to main content

OCR extractor for PAN and Aadhaar card details

Project description

OCR Extractor

PyPI version Python versions License Downloads

A simple and efficient OCR-based data extraction tool for Indian PAN and Aadhaar cards using Tesseract OCR.

🆕 What's New in v0.1.3

  • Corrected example usage in README:
    • print(pan_data.get_pan())
    • print(aadhaar_data.get_aadhaar())
  • Includes all features from v0.1.2:
    • Added tesseract_cmd parameter to ExtractAadhaarData and ExtractPanData for custom Tesseract paths.
    • Fixed issue with preprocessing argument (preprocess) in child classes not being passed correctly.

(For full version history, see CHANGELOG.md)

✨ Features

  • Extract PAN card data with a single function call
  • Extract Aadhaar card data with a single function call
  • Built-in preprocessing option for better OCR accuracy
  • Cross-platform support (Windows, Linux, macOS) with configurable Tesseract path

📦 Installation

pip install ocr-pro

🚀 Usage

Extract PAN Card Data

from ocr import ExtractPanData

# Default usage (preprocess=False by default)
pan_data = ExtractPanData("pan_image.jpg", tesseract_cmd="/usr/bin/tesseract")

print(pan_data.get_pan())

Extract Aadhaar Card Data

from ocr import ExtractAadhaarData

# You can also enable preprocessing
aadhaar_data = ExtractAadhaarData("aadhaar_image.jpg", tesseract_cmd="/usr/bin/tesseract", preprocess=True)

print(aadhaar_data.get_aadhaar())

Arguments

  • filepath (str) → Path to the image file
  • tesseract_cmd (str, optional) → Path to the Tesseract executable (default: system auto-detection or "C:\Program Files\Tesseract-OCR\tesseract.exe" on Windows)
  • preprocess (bool, default=False) → Whether to apply preprocessing for better OCR results

⚙️ Requirements

📜 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_pro-0.1.3.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_pro-0.1.3-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file ocr_pro-0.1.3.tar.gz.

File metadata

  • Download URL: ocr_pro-0.1.3.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for ocr_pro-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e4b99c5ce043fe0963b23430d268b9eec9f79e97dc6d7276e5a88dc55820b412
MD5 69dd4c19d57c9c182714df06e754ec29
BLAKE2b-256 6b3d710f36fa536fe7702ede8b3189d28bb5522115f2489d6bf8c499dba99ebd

See more details on using hashes here.

File details

Details for the file ocr_pro-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ocr_pro-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for ocr_pro-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bcb8c25420f71d038214499e8cdf142aff237ff4d996bf30cf50ced73abb2081
MD5 af7475218efe5947b0e2bcb7a4f4c593
BLAKE2b-256 a72f23bc8db762918c3b01c315a9099970b8ffd0d57a7042f9369ad300c32ba5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page