IQEdge.ai OCR — PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

These details have not been verified by PyPI

Project links

Project description

IQEdge.ai OCR

PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, and books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.

Install

pip install iqedge-ai-ocr

Usage

# Basic OCR — PDF to text
iqedge-ai-ocr ocr input.pdf

# Structured JSON output
iqedge-ai-ocr ocr input.pdf -o output.json

# Multi-language (Telugu + English)
iqedge-ai-ocr ocr input.pdf --lang tel+eng

# Full 4-pass correction (best quality)
iqedge-ai-ocr ocr input.pdf --passes 4

# CSV output
iqedge-ai-ocr ocr input.pdf --format csv -o output.csv

# Specific pages
iqedge-ai-ocr ocr input.pdf --pages 1-5,10

# Use Google Cloud Vision (most accurate)
iqedge-ai-ocr ocr input.pdf --engine google_vision

# Train from your documents
iqedge-ai-ocr train corpus_dir/

# System info
iqedge-ai-ocr info

Python API

from iqedge_ocr import OCRAgent

agent = OCRAgent(lang="tel+eng", max_passes=4)
doc = agent.process("input.pdf")

print(doc.text)           # full text
print(doc.confidence)     # 0.0 - 1.0
print(doc.word_count)     # total words
print(doc.to_dict())      # structured JSON

Languages (63)

Indian (14): Hindi, Telugu, Tamil, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Odia, Urdu, Assamese, Nepali, Sanskrit

Western Europe (9): English, Spanish, French, German, Italian, Portuguese, Dutch, Catalan, Greek

Scandinavia (4): Swedish, Norwegian, Danish, Finnish

Eastern Europe (13): Russian, Polish, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovak, Lithuanian, Latvian, Estonian

Caucasus (3): Georgian, Armenian, Azerbaijani

Middle East (4): Arabic, Hebrew, Persian, Turkish

East Asia (4): Chinese (Simplified/Traditional), Japanese, Korean

Southeast Asia (9): Thai, Vietnamese, Indonesian, Malay, Filipino, Cebuano, Burmese, Khmer, Lao

Central Asia (1): Uzbek

Africa (2): Swahili, Amharic

OCR Engines

Tesseract (default) — free, offline, 63 languages
Google Cloud Vision — cloud, highest accuracy for complex scripts
EasyOCR — good for handwriting and curved text

How It Works

Pass 1 (200 DPI): Fast scan, extract all text with per-word confidence scores
Pass 2 (300 DPI): Re-OCR only low-confidence regions
Pass 3 (400 DPI): Re-OCR remaining stubborn regions
Pass 4: Post-processing with learned corrections, pattern matching, dictionary

Only re-renders the regions that need it — not the whole page each time. Preserves document structure: text as text, images as images, tables as tables.

Output Formats

Text — plain text (default)
JSON — structured with pages, regions, words, confidence scores
CSV — tabular output with per-line confidence
Searchable PDF — invisible text overlay on original pages

Training

Train IQEdge.ai OCR from your own documents to improve accuracy:

# Place PDF + ground-truth pairs in a directory:
#   corpus/document1.pdf + corpus/document1.txt
#   corpus/document2.pdf + corpus/document2.txt

iqedge-ai-ocr train corpus/

The training system learns character confusions and word corrections specific to your documents.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Apr 6, 2026

This version

0.1.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iqedge_ai_ocr-0.1.0.tar.gz (30.1 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iqedge_ai_ocr-0.1.0-py3-none-any.whl (38.4 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file iqedge_ai_ocr-0.1.0.tar.gz.

File metadata

Download URL: iqedge_ai_ocr-0.1.0.tar.gz
Upload date: Apr 6, 2026
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0e2b688513e2d9a25ae06ffa6b514f9e560bc4b23f98b068d27e9ee4adb1166f`
MD5	`d88b3bd3b866f8dfc0ff7b35250c12da`
BLAKE2b-256	`3dbd3c04f894bb10a82539bb9910324b08c398c15e5643a18e58ad8b8df9e69e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.1.0.tar.gz:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: iqedge_ai_ocr-0.1.0.tar.gz
- Subject digest: 0e2b688513e2d9a25ae06ffa6b514f9e560bc4b23f98b068d27e9ee4adb1166f
- Sigstore transparency entry: 1241898007
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: pi-ventures/iqedge-ai-ocr@89020b3bdd9857757b65dc766c45b236008792d8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pi-ventures
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@89020b3bdd9857757b65dc766c45b236008792d8
- Trigger Event: release

File details

Details for the file iqedge_ai_ocr-0.1.0-py3-none-any.whl.

File metadata

Download URL: iqedge_ai_ocr-0.1.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 38.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for iqedge_ai_ocr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dba069d082f683bc5cc04641b01b1caf27403af7771c39cf11a401fbadd101ec`
MD5	`98631108b077ae8de64371182acd89b4`
BLAKE2b-256	`e08691e29c35a6ce0a40b3f9afdd2d3de416c6029330e61f390fa9d09ea01c83`

See more details on using hashes here.

Provenance

The following attestation bundles were made for iqedge_ai_ocr-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pi-ventures/iqedge-ai-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: iqedge_ai_ocr-0.1.0-py3-none-any.whl
- Subject digest: dba069d082f683bc5cc04641b01b1caf27403af7771c39cf11a401fbadd101ec
- Sigstore transparency entry: 1241898076
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: pi-ventures/iqedge-ai-ocr@89020b3bdd9857757b65dc766c45b236008792d8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pi-ventures
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@89020b3bdd9857757b65dc766c45b236008792d8
- Trigger Event: release

iqedge-ai-ocr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

IQEdge.ai OCR

Install

Usage

Python API

Languages (63)

OCR Engines

How It Works

Output Formats

Training

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance