Skip to main content

Multi-engine OCR pipeline — beats Google Vision API

Project description

ocr-pipeline

Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.

Architecture

Image → [Preprocessor] → Tesseract ┐
                       → PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
                       → EasyOCR   ┘
Stage What it does Your edge vs Vision API
0 — Preprocessor Hough deskew, CLAHE, binarize, optional 2× EDSR Domain-specific prep; Vision API gets raw images
1 — Three engines Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) Different failure modes → ensemble eliminates each
2 — Voter IoU spatial grouping + weighted confidence + agreement bonus Cross-engine consensus exposed; Vision API hides this
3 — Corrector SymSpell + domain vocabulary protection Tuned to your label vocabulary; Vision API uses general LM

Prerequisites

Tesseract binary (Windows — required):

winget install UB-Mannheim.TesseractOCR

Python 3.10+ must be installed and on PATH.

Quick Start

# 1. Clone / copy the project
cd ocr-pipeline

# 2. Create virtual environment and install (CPU default)
setup_venv.bat

# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu

# 3. Activate
.venv\Scripts\activate

# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty

# Save output to file
ocr-pipeline path\to\image.jpg --output result.json

# Text only
ocr-pipeline path\to\image.jpg --text-only

# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve

Python API

from ocr_pipeline import beat_vision_api

result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])

Configuration

Copy .env.example to .env and edit:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false

Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.

Running Tests

pytest tests/test_voter.py -v

All 7 tests run without GPU or model downloads.

Optional: Super-Resolution

EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:

  1. Run python download_models.py and choose y when prompted for EDSR
  2. Set USE_SUPER_RESOLVE=true in .env

Project Structure

ocr-pipeline/
├── ocr_pipeline/       ← Python package
│   ├── config.py       ← All tunable parameters
│   ├── preprocessor.py ← Stage 0
│   ├── engines.py      ← Stage 1
│   ├── voter.py        ← Stage 2
│   ├── corrector.py    ← Stage 3
│   └── pipeline.py     ← Orchestration
├── tests/
│   └── test_voter.py   ← Unit tests (no GPU)
├── models/             ← Downloaded model files
├── setup_venv.bat      ← Windows setup
└── download_models.py  ← Model downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trivision_ocr-1.0.5.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trivision_ocr-1.0.5-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file trivision_ocr-1.0.5.tar.gz.

File metadata

  • Download URL: trivision_ocr-1.0.5.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.5.tar.gz
Algorithm Hash digest
SHA256 ced3bdad97acbd2aa80a1ded862438a14688e9b490b42c4b1e56c0c1445450c2
MD5 cdb3400780d53cabef4e9e78d2733a99
BLAKE2b-256 f3b3436179bd32166ceb08e0881162f326a5548f3c11b67d8c0d24beda305831

See more details on using hashes here.

File details

Details for the file trivision_ocr-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: trivision_ocr-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cfb6f866de2d7e8069d8d597131c5ac6b00fcd49b340a0e8b9250c59015b583a
MD5 f4284105a919d2fb1d13cdde993ac9d7
BLAKE2b-256 263d7826eee929f59f293e451397372d4eefc53167bbe705917178f088cf58bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page