Skip to main content

IQEdge.ai OCR — PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

Project description

IQEdge.ai OCR

PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, and books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

Install

pip install iqedge-ai-ocr

Usage

# Basic OCR — PDF to text
iqedge-ai-ocr ocr input.pdf

# Structured JSON output
iqedge-ai-ocr ocr input.pdf -o output.json

# Multi-language (Telugu + English)
iqedge-ai-ocr ocr input.pdf --lang tel+eng

# Full 4-pass correction (best quality)
iqedge-ai-ocr ocr input.pdf --passes 4

# CSV output
iqedge-ai-ocr ocr input.pdf --format csv -o output.csv

# Specific pages
iqedge-ai-ocr ocr input.pdf --pages 1-5,10

# Use Google Cloud Vision (most accurate)
iqedge-ai-ocr ocr input.pdf --engine google_vision

# Train from your documents
iqedge-ai-ocr train corpus_dir/

# System info
iqedge-ai-ocr info

Python API

from iqedge_ocr import OCRAgent

agent = OCRAgent(lang="tel+eng", max_passes=4)
doc = agent.process("input.pdf")

print(doc.text)           # full text
print(doc.confidence)     # 0.0 - 1.0
print(doc.word_count)     # total words
print(doc.to_dict())      # structured JSON

Languages (63)

Indian (14): Hindi, Telugu, Tamil, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Odia, Urdu, Assamese, Nepali, Sanskrit

Western Europe (9): English, Spanish, French, German, Italian, Portuguese, Dutch, Catalan, Greek

Scandinavia (4): Swedish, Norwegian, Danish, Finnish

Eastern Europe (13): Russian, Polish, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovak, Lithuanian, Latvian, Estonian

Caucasus (3): Georgian, Armenian, Azerbaijani

Middle East (4): Arabic, Hebrew, Persian, Turkish

East Asia (4): Chinese (Simplified/Traditional), Japanese, Korean

Southeast Asia (9): Thai, Vietnamese, Indonesian, Malay, Filipino, Cebuano, Burmese, Khmer, Lao

Central Asia (1): Uzbek

Africa (2): Swahili, Amharic

OCR Engines

  • Tesseract (default) — free, offline, 63 languages
  • Google Cloud Vision — cloud, highest accuracy for complex scripts
  • EasyOCR — good for handwriting and curved text

How It Works

  1. Pass 1 (200 DPI): Fast scan, extract all text with per-word confidence scores
  2. Pass 2 (300 DPI): Re-OCR only low-confidence regions
  3. Pass 3 (400 DPI): Re-OCR remaining stubborn regions
  4. Pass 4: Post-processing with learned corrections, pattern matching, dictionary

Only re-renders the regions that need it — not the whole page each time. Preserves document structure: text as text, images as images, tables as tables.

Output Formats

  • Text — plain text (default)
  • JSON — structured with pages, regions, words, confidence scores
  • CSV — tabular output with per-line confidence
  • Searchable PDF — invisible text overlay on original pages

Training

Train IQEdge.ai OCR from your own documents to improve accuracy:

# Place PDF + ground-truth pairs in a directory:
#   corpus/document1.pdf + corpus/document1.txt
#   corpus/document2.pdf + corpus/document2.txt

iqedge-ai-ocr train corpus/

The training system learns character confusions and word corrections specific to your documents.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iqedge_ai_ocr-0.2.0.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iqedge_ai_ocr-0.2.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file iqedge_ai_ocr-0.2.0.tar.gz.

File metadata

  • Download URL: iqedge_ai_ocr-0.2.0.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 84a961463174bab79e15b13a72d42434e0ada364106c9f4da9ed9c6fccb07223
MD5 1bd19e5d379a9e33c566cc94b8343057
BLAKE2b-256 86b52f4613207d145d43eec63ac325ca8bdb4234a1ace6f6215779733c4a159b

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.2.0.tar.gz:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iqedge_ai_ocr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: iqedge_ai_ocr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 147835ae829fcd15bbad48fb11dfc42affe2facae34dfec528ea7364370b130f
MD5 da555e839e1688dbfb20f52b19917377
BLAKE2b-256 868cb7ab47e11f73c4d5508ff0eb57b4f1ee45ac29b52423022f88502b2bb84f

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.2.0-py3-none-any.whl:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page