Skip to main content

Multi-engine OCR pipeline — beats Google Vision API

Project description

ocr-pipeline

Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.

Architecture

Image → [Preprocessor] → Tesseract ┐
                       → PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
                       → EasyOCR   ┘
Stage What it does Your edge vs Vision API
0 — Preprocessor Hough deskew, CLAHE, binarize, optional 2× EDSR Domain-specific prep; Vision API gets raw images
1 — Three engines Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) Different failure modes → ensemble eliminates each
2 — Voter IoU spatial grouping + weighted confidence + agreement bonus Cross-engine consensus exposed; Vision API hides this
3 — Corrector SymSpell + domain vocabulary protection Tuned to your label vocabulary; Vision API uses general LM

Prerequisites

Tesseract binary (Windows — required):

winget install UB-Mannheim.TesseractOCR

Python 3.10+ must be installed and on PATH.

Quick Start

# 1. Clone / copy the project
cd ocr-pipeline

# 2. Create virtual environment and install (CPU default)
setup_venv.bat

# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu

# 3. Activate
.venv\Scripts\activate

# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty

# Save output to file
ocr-pipeline path\to\image.jpg --output result.json

# Text only
ocr-pipeline path\to\image.jpg --text-only

# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve

Python API

from ocr_pipeline import beat_vision_api

result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])

Configuration

Copy .env.example to .env and edit:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false

Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.

Running Tests

pytest tests/test_voter.py -v

All 7 tests run without GPU or model downloads.

Optional: Super-Resolution

EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:

  1. Run python download_models.py and choose y when prompted for EDSR
  2. Set USE_SUPER_RESOLVE=true in .env

Project Structure

ocr-pipeline/
├── ocr_pipeline/       ← Python package
│   ├── config.py       ← All tunable parameters
│   ├── preprocessor.py ← Stage 0
│   ├── engines.py      ← Stage 1
│   ├── voter.py        ← Stage 2
│   ├── corrector.py    ← Stage 3
│   └── pipeline.py     ← Orchestration
├── tests/
│   └── test_voter.py   ← Unit tests (no GPU)
├── models/             ← Downloaded model files
├── setup_venv.bat      ← Windows setup
└── download_models.py  ← Model downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trivision_ocr-1.0.3.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trivision_ocr-1.0.3-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file trivision_ocr-1.0.3.tar.gz.

File metadata

  • Download URL: trivision_ocr-1.0.3.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 651eae4706f1bd06b927a150b3b040e9636b607309666c02bec61a24c9344fc7
MD5 c1f4483684a1585dbfde285ae5ae964d
BLAKE2b-256 3ab6687e681247fa35603e394400b7eb00647a0794e8f0d5b482a2c0bff19d49

See more details on using hashes here.

File details

Details for the file trivision_ocr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: trivision_ocr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for trivision_ocr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e6d6622e1199d651bcf8c479bc6307dd447365f29e52f6e4fd45066a2f8213e7
MD5 f55f834db7729ed68c295b7ab3616037
BLAKE2b-256 6d9c71ce1a008307fdbaad6273348ae0a67a70120a4f0c881305f0935522086f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page