Skip to main content

Multi-engine OCR pipeline — beats Google Vision API

Project description

ocr-pipeline

Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.

Architecture

Image → [Preprocessor] → Tesseract ┐
                       → PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
                       → EasyOCR   ┘
Stage What it does Your edge vs Vision API
0 — Preprocessor Hough deskew, CLAHE, binarize, optional 2× EDSR Domain-specific prep; Vision API gets raw images
1 — Three engines Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) Different failure modes → ensemble eliminates each
2 — Voter IoU spatial grouping + weighted confidence + agreement bonus Cross-engine consensus exposed; Vision API hides this
3 — Corrector SymSpell + domain vocabulary protection Tuned to your label vocabulary; Vision API uses general LM

Prerequisites

Tesseract binary (Windows — required):

winget install UB-Mannheim.TesseractOCR

Python 3.10+ must be installed and on PATH.

Quick Start

# 1. Clone / copy the project
cd ocr-pipeline

# 2. Create virtual environment and install (CPU default)
setup_venv.bat

# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu

# 3. Activate
.venv\Scripts\activate

# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty

# Save output to file
ocr-pipeline path\to\image.jpg --output result.json

# Text only
ocr-pipeline path\to\image.jpg --text-only

# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve

Python API

from ocr_pipeline import beat_vision_api

result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])

Configuration

Copy .env.example to .env and edit:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false

Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.

Running Tests

pytest tests/test_voter.py -v

All 7 tests run without GPU or model downloads.

Optional: Super-Resolution

EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:

  1. Run python download_models.py and choose y when prompted for EDSR
  2. Set USE_SUPER_RESOLVE=true in .env

Project Structure

ocr-pipeline/
├── ocr_pipeline/       ← Python package
│   ├── config.py       ← All tunable parameters
│   ├── preprocessor.py ← Stage 0
│   ├── engines.py      ← Stage 1
│   ├── voter.py        ← Stage 2
│   ├── corrector.py    ← Stage 3
│   └── pipeline.py     ← Orchestration
├── tests/
│   └── test_voter.py   ← Unit tests (no GPU)
├── models/             ← Downloaded model files
├── setup_venv.bat      ← Windows setup
└── download_models.py  ← Model downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trivision_ocr-1.0.0.tar.gz (31.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trivision_ocr-1.0.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file trivision_ocr-1.0.0.tar.gz.

File metadata

  • Download URL: trivision_ocr-1.0.0.tar.gz
  • Upload date:
  • Size: 31.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8ea8e23e994e80d47311c420fb0d946f3aa80659a4d134313544a12435280221
MD5 2bfcce0a9dcde45a159f2c02b3917d08
BLAKE2b-256 b4b0f177277786178c39c4060e90e64a13585d16c9e7c2dad53e25ecd38d216f

See more details on using hashes here.

File details

Details for the file trivision_ocr-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: trivision_ocr-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9dcf85b76601232f98d6e29ad8aa2788d07f7aab20fea5edc3977b652f326554
MD5 0d0b159bf233aef8626c534228b8b62d
BLAKE2b-256 e2cee21b49fb82bbf8d9f016165a5bbd90d7f0c08f4be80f22e23aa62000507d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page