Skip to main content

Multi-engine OCR pipeline — beats Google Vision API

Project description

ocr-pipeline

Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.

Architecture

Image → [Preprocessor] → Tesseract ┐
                       → PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
                       → EasyOCR   ┘
Stage What it does Your edge vs Vision API
0 — Preprocessor Hough deskew, CLAHE, binarize, optional 2× EDSR Domain-specific prep; Vision API gets raw images
1 — Three engines Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) Different failure modes → ensemble eliminates each
2 — Voter IoU spatial grouping + weighted confidence + agreement bonus Cross-engine consensus exposed; Vision API hides this
3 — Corrector SymSpell + domain vocabulary protection Tuned to your label vocabulary; Vision API uses general LM

Prerequisites

Tesseract binary (Windows — required):

winget install UB-Mannheim.TesseractOCR

Python 3.10+ must be installed and on PATH.

Quick Start

# 1. Clone / copy the project
cd ocr-pipeline

# 2. Create virtual environment and install (CPU default)
setup_venv.bat

# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu

# 3. Activate
.venv\Scripts\activate

# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty

# Save output to file
ocr-pipeline path\to\image.jpg --output result.json

# Text only
ocr-pipeline path\to\image.jpg --text-only

# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve

Python API

from ocr_pipeline import beat_vision_api

result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])

Configuration

Copy .env.example to .env and edit:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false

Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.

Running Tests

pytest tests/test_voter.py -v

All 7 tests run without GPU or model downloads.

Optional: Super-Resolution

EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:

  1. Run python download_models.py and choose y when prompted for EDSR
  2. Set USE_SUPER_RESOLVE=true in .env

Project Structure

ocr-pipeline/
├── ocr_pipeline/       ← Python package
│   ├── config.py       ← All tunable parameters
│   ├── preprocessor.py ← Stage 0
│   ├── engines.py      ← Stage 1
│   ├── voter.py        ← Stage 2
│   ├── corrector.py    ← Stage 3
│   └── pipeline.py     ← Orchestration
├── tests/
│   └── test_voter.py   ← Unit tests (no GPU)
├── models/             ← Downloaded model files
├── setup_venv.bat      ← Windows setup
└── download_models.py  ← Model downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trivision_ocr-1.0.2.tar.gz (31.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trivision_ocr-1.0.2-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file trivision_ocr-1.0.2.tar.gz.

File metadata

  • Download URL: trivision_ocr-1.0.2.tar.gz
  • Upload date:
  • Size: 31.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.2.tar.gz
Algorithm Hash digest
SHA256 0bd293a5a54e9e850e5e5acbcc0c89d2ad7557a91778c9c9f2f860d1f5cbdf2d
MD5 f1903a5fb0741c02f1f97596bfe15159
BLAKE2b-256 44c7fb1da35d5f94472f4cf78c8250731cf7588082f169c78c8a57bcfa97f7aa

See more details on using hashes here.

File details

Details for the file trivision_ocr-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: trivision_ocr-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 16c680d2a7575e9bc7a40635c5428769e05484ed35b61108a2d03b790a353ba0
MD5 889593bbc6b62c0bbec970642fe914a7
BLAKE2b-256 c43e2f3d002f2488bcd98a007687a9dc126b77bcf7667b6ffb471dced708a484

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page