Skip to main content

IQEdge.ai OCR — PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

Project description

IQEdge.ai OCR

PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, and books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

Install

pip install iqedge-ai-ocr

Usage

# Basic OCR — PDF to text
iqedge-ai-ocr ocr input.pdf

# Structured JSON output
iqedge-ai-ocr ocr input.pdf -o output.json

# Multi-language (Telugu + English)
iqedge-ai-ocr ocr input.pdf --lang tel+eng

# Full 4-pass correction (best quality)
iqedge-ai-ocr ocr input.pdf --passes 4

# CSV output
iqedge-ai-ocr ocr input.pdf --format csv -o output.csv

# Specific pages
iqedge-ai-ocr ocr input.pdf --pages 1-5,10

# Use Google Cloud Vision (most accurate)
iqedge-ai-ocr ocr input.pdf --engine google_vision

# Train from your documents
iqedge-ai-ocr train corpus_dir/

# System info
iqedge-ai-ocr info

Python API

from iqedge_ocr import OCRAgent

agent = OCRAgent(lang="tel+eng", max_passes=4)
doc = agent.process("input.pdf")

print(doc.text)           # full text
print(doc.confidence)     # 0.0 - 1.0
print(doc.word_count)     # total words
print(doc.to_dict())      # structured JSON

Languages (63)

Indian (14): Hindi, Telugu, Tamil, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Odia, Urdu, Assamese, Nepali, Sanskrit

Western Europe (9): English, Spanish, French, German, Italian, Portuguese, Dutch, Catalan, Greek

Scandinavia (4): Swedish, Norwegian, Danish, Finnish

Eastern Europe (13): Russian, Polish, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovak, Lithuanian, Latvian, Estonian

Caucasus (3): Georgian, Armenian, Azerbaijani

Middle East (4): Arabic, Hebrew, Persian, Turkish

East Asia (4): Chinese (Simplified/Traditional), Japanese, Korean

Southeast Asia (9): Thai, Vietnamese, Indonesian, Malay, Filipino, Cebuano, Burmese, Khmer, Lao

Central Asia (1): Uzbek

Africa (2): Swahili, Amharic

OCR Engines

  • Tesseract (default) — free, offline, 63 languages
  • Google Cloud Vision — cloud, highest accuracy for complex scripts
  • EasyOCR — good for handwriting and curved text

How It Works

  1. Pass 1 (200 DPI): Fast scan, extract all text with per-word confidence scores
  2. Pass 2 (300 DPI): Re-OCR only low-confidence regions
  3. Pass 3 (400 DPI): Re-OCR remaining stubborn regions
  4. Pass 4: Post-processing with learned corrections, pattern matching, dictionary

Only re-renders the regions that need it — not the whole page each time. Preserves document structure: text as text, images as images, tables as tables.

Output Formats

  • Text — plain text (default)
  • JSON — structured with pages, regions, words, confidence scores
  • CSV — tabular output with per-line confidence
  • Searchable PDF — invisible text overlay on original pages

Training

Train IQEdge.ai OCR from your own documents to improve accuracy:

# Place PDF + ground-truth pairs in a directory:
#   corpus/document1.pdf + corpus/document1.txt
#   corpus/document2.pdf + corpus/document2.txt

iqedge-ai-ocr train corpus/

The training system learns character confusions and word corrections specific to your documents.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iqedge_ai_ocr-0.1.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iqedge_ai_ocr-0.1.0-py3-none-any.whl (38.4 kB view details)

Uploaded Python 3

File details

Details for the file iqedge_ai_ocr-0.1.0.tar.gz.

File metadata

  • Download URL: iqedge_ai_ocr-0.1.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0e2b688513e2d9a25ae06ffa6b514f9e560bc4b23f98b068d27e9ee4adb1166f
MD5 d88b3bd3b866f8dfc0ff7b35250c12da
BLAKE2b-256 3dbd3c04f894bb10a82539bb9910324b08c398c15e5643a18e58ad8b8df9e69e

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.1.0.tar.gz:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iqedge_ai_ocr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iqedge_ai_ocr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dba069d082f683bc5cc04641b01b1caf27403af7771c39cf11a401fbadd101ec
MD5 98631108b077ae8de64371182acd89b4
BLAKE2b-256 e08691e29c35a6ce0a40b3f9afdd2d3de416c6029330e61f390fa9d09ea01c83

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page