IQEdge.ai OCR — PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.
Project description
IQEdge.ai OCR
PDF to Text converter with multi-pass self-correction. Extract text from scanned PDFs, images, and books. 63 languages including Hindi, Telugu, Tamil, Arabic, Chinese, Japanese. Table detection, searchable PDF output.
Install
pip install iqedge-ai-ocr
Usage
# Basic OCR — PDF to text
iqedge-ai-ocr ocr input.pdf
# Structured JSON output
iqedge-ai-ocr ocr input.pdf -o output.json
# Multi-language (Telugu + English)
iqedge-ai-ocr ocr input.pdf --lang tel+eng
# Full 4-pass correction (best quality)
iqedge-ai-ocr ocr input.pdf --passes 4
# CSV output
iqedge-ai-ocr ocr input.pdf --format csv -o output.csv
# Specific pages
iqedge-ai-ocr ocr input.pdf --pages 1-5,10
# Use Google Cloud Vision (most accurate)
iqedge-ai-ocr ocr input.pdf --engine google_vision
# Train from your documents
iqedge-ai-ocr train corpus_dir/
# System info
iqedge-ai-ocr info
Python API
from iqedge_ocr import OCRAgent
agent = OCRAgent(lang="tel+eng", max_passes=4)
doc = agent.process("input.pdf")
print(doc.text) # full text
print(doc.confidence) # 0.0 - 1.0
print(doc.word_count) # total words
print(doc.to_dict()) # structured JSON
Languages (63)
Indian (14): Hindi, Telugu, Tamil, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Odia, Urdu, Assamese, Nepali, Sanskrit
Western Europe (9): English, Spanish, French, German, Italian, Portuguese, Dutch, Catalan, Greek
Scandinavia (4): Swedish, Norwegian, Danish, Finnish
Eastern Europe (13): Russian, Polish, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovak, Lithuanian, Latvian, Estonian
Caucasus (3): Georgian, Armenian, Azerbaijani
Middle East (4): Arabic, Hebrew, Persian, Turkish
East Asia (4): Chinese (Simplified/Traditional), Japanese, Korean
Southeast Asia (9): Thai, Vietnamese, Indonesian, Malay, Filipino, Cebuano, Burmese, Khmer, Lao
Central Asia (1): Uzbek
Africa (2): Swahili, Amharic
OCR Engines
- Tesseract (default) — free, offline, 63 languages
- Google Cloud Vision — cloud, highest accuracy for complex scripts
- EasyOCR — good for handwriting and curved text
How It Works
- Pass 1 (200 DPI): Fast scan, extract all text with per-word confidence scores
- Pass 2 (300 DPI): Re-OCR only low-confidence regions
- Pass 3 (400 DPI): Re-OCR remaining stubborn regions
- Pass 4: Post-processing with learned corrections, pattern matching, dictionary
Only re-renders the regions that need it — not the whole page each time. Preserves document structure: text as text, images as images, tables as tables.
Output Formats
- Text — plain text (default)
- JSON — structured with pages, regions, words, confidence scores
- CSV — tabular output with per-line confidence
- Searchable PDF — invisible text overlay on original pages
Training
Train IQEdge.ai OCR from your own documents to improve accuracy:
# Place PDF + ground-truth pairs in a directory:
# corpus/document1.pdf + corpus/document1.txt
# corpus/document2.pdf + corpus/document2.txt
iqedge-ai-ocr train corpus/
The training system learns character confusions and word corrections specific to your documents.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iqedge_ai_ocr-0.2.0.tar.gz.
File metadata
- Download URL: iqedge_ai_ocr-0.2.0.tar.gz
- Upload date:
- Size: 32.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84a961463174bab79e15b13a72d42434e0ada364106c9f4da9ed9c6fccb07223
|
|
| MD5 |
1bd19e5d379a9e33c566cc94b8343057
|
|
| BLAKE2b-256 |
86b52f4613207d145d43eec63ac325ca8bdb4234a1ace6f6215779733c4a159b
|
Provenance
The following attestation bundles were made for iqedge_ai_ocr-0.2.0.tar.gz:
Publisher:
publish.yml on pi-ventures/iqedge-ai-ocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iqedge_ai_ocr-0.2.0.tar.gz -
Subject digest:
84a961463174bab79e15b13a72d42434e0ada364106c9f4da9ed9c6fccb07223 - Sigstore transparency entry: 1242017469
- Sigstore integration time:
-
Permalink:
pi-ventures/iqedge-ai-ocr@d42b41f1d306f84750dabddf09f9e8ec72e0c0e8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/pi-ventures
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d42b41f1d306f84750dabddf09f9e8ec72e0c0e8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file iqedge_ai_ocr-0.2.0-py3-none-any.whl.
File metadata
- Download URL: iqedge_ai_ocr-0.2.0-py3-none-any.whl
- Upload date:
- Size: 40.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
147835ae829fcd15bbad48fb11dfc42affe2facae34dfec528ea7364370b130f
|
|
| MD5 |
da555e839e1688dbfb20f52b19917377
|
|
| BLAKE2b-256 |
868cb7ab47e11f73c4d5508ff0eb57b4f1ee45ac29b52423022f88502b2bb84f
|
Provenance
The following attestation bundles were made for iqedge_ai_ocr-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on pi-ventures/iqedge-ai-ocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iqedge_ai_ocr-0.2.0-py3-none-any.whl -
Subject digest:
147835ae829fcd15bbad48fb11dfc42affe2facae34dfec528ea7364370b130f - Sigstore transparency entry: 1242017562
- Sigstore integration time:
-
Permalink:
pi-ventures/iqedge-ai-ocr@d42b41f1d306f84750dabddf09f9e8ec72e0c0e8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/pi-ventures
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d42b41f1d306f84750dabddf09f9e8ec72e0c0e8 -
Trigger Event:
release
-
Statement type: