Skip to main content

Multi-engine OCR pipeline — beats Google Vision API

Project description

ocr-pipeline

Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.

Architecture

Image → [Preprocessor] → Tesseract ┐
                       → PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
                       → EasyOCR   ┘
Stage What it does Your edge vs Vision API
0 — Preprocessor Hough deskew, CLAHE, binarize, optional 2× EDSR Domain-specific prep; Vision API gets raw images
1 — Three engines Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) Different failure modes → ensemble eliminates each
2 — Voter IoU spatial grouping + weighted confidence + agreement bonus Cross-engine consensus exposed; Vision API hides this
3 — Corrector SymSpell + domain vocabulary protection Tuned to your label vocabulary; Vision API uses general LM

Prerequisites

Tesseract binary (Windows — required):

winget install UB-Mannheim.TesseractOCR

Python 3.10+ must be installed and on PATH.

Quick Start

# 1. Clone / copy the project
cd ocr-pipeline

# 2. Create virtual environment and install (CPU default)
setup_venv.bat

# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu

# 3. Activate
.venv\Scripts\activate

# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty

# Save output to file
ocr-pipeline path\to\image.jpg --output result.json

# Text only
ocr-pipeline path\to\image.jpg --text-only

# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve

Python API

from ocr_pipeline import beat_vision_api

result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])

Configuration

Copy .env.example to .env and edit:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false

Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.

Running Tests

pytest tests/test_voter.py -v

All 7 tests run without GPU or model downloads.

Optional: Super-Resolution

EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:

  1. Run python download_models.py and choose y when prompted for EDSR
  2. Set USE_SUPER_RESOLVE=true in .env

Project Structure

ocr-pipeline/
├── ocr_pipeline/       ← Python package
│   ├── config.py       ← All tunable parameters
│   ├── preprocessor.py ← Stage 0
│   ├── engines.py      ← Stage 1
│   ├── voter.py        ← Stage 2
│   ├── corrector.py    ← Stage 3
│   └── pipeline.py     ← Orchestration
├── tests/
│   └── test_voter.py   ← Unit tests (no GPU)
├── models/             ← Downloaded model files
├── setup_venv.bat      ← Windows setup
└── download_models.py  ← Model downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trivision_ocr-1.0.4.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trivision_ocr-1.0.4-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file trivision_ocr-1.0.4.tar.gz.

File metadata

  • Download URL: trivision_ocr-1.0.4.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.4.tar.gz
Algorithm Hash digest
SHA256 48d67398d1b52d09ad45cb9fcc3d9077b0e86b91a76c6d4b32183233072df639
MD5 cadd25b80dad366c3f83a76b2b8f559a
BLAKE2b-256 29cd5f96587cb216efc751217f6cb7924e5166b22a9291f5ad6f9e7109a024f2

See more details on using hashes here.

File details

Details for the file trivision_ocr-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: trivision_ocr-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9848f6a51c5cdf3c2afaebcd09c326cd9a8717830b6f0fd2e3ebfabf69bec568
MD5 b1231d36aa99a359739584fc9fb865e4
BLAKE2b-256 7f2c2eb4cb8b942c81da7ed8d090db2be71a59758588b332e11adbffb0d6d182

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page