Multi-engine OCR pipeline — beats Google Vision API
Project description
ocr-pipeline
Multi-engine OCR pipeline combining Tesseract, PaddleOCR, and EasyOCR with confidence-weighted voting and domain-aware spell correction. Output is compatible with the Google Vision API JSON schema.
Architecture
Image → [Preprocessor] → Tesseract ┐
→ PaddleOCR ├─→ [Voter] → [Corrector] → Vision API JSON
→ EasyOCR ┘
| Stage | What it does | Your edge vs Vision API |
|---|---|---|
| 0 — Preprocessor | Hough deskew, CLAHE, binarize, optional 2× EDSR | Domain-specific prep; Vision API gets raw images |
| 1 — Three engines | Tesseract (LSTM), PaddleOCR (DBNet+SVTR), EasyOCR (CRAFT+CRNN) | Different failure modes → ensemble eliminates each |
| 2 — Voter | IoU spatial grouping + weighted confidence + agreement bonus | Cross-engine consensus exposed; Vision API hides this |
| 3 — Corrector | SymSpell + domain vocabulary protection | Tuned to your label vocabulary; Vision API uses general LM |
Prerequisites
Tesseract binary (Windows — required):
winget install UB-Mannheim.TesseractOCR
Python 3.10+ must be installed and on PATH.
Quick Start
# 1. Clone / copy the project
cd ocr-pipeline
# 2. Create virtual environment and install (CPU default)
setup_venv.bat
# For GPU (CUDA + paddlepaddle-gpu):
setup_venv.bat gpu
# 3. Activate
.venv\Scripts\activate
# 4. Run on an image
ocr-pipeline path\to\image.jpg --pretty
# Save output to file
ocr-pipeline path\to\image.jpg --output result.json
# Text only
ocr-pipeline path\to\image.jpg --text-only
# Skip super-resolution (faster, no EDSR model needed)
ocr-pipeline path\to\image.jpg --no-super-resolve
Python API
from ocr_pipeline import beat_vision_api
result = beat_vision_api("path/to/image.jpg")
print(result["responses"][0]["fullTextAnnotation"]["text"])
Configuration
Copy .env.example to .env and edit:
TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
USE_GPU=false
USE_SUPER_RESOLVE=false
Add project-specific label codes to DOMAIN_TERMS in ocr_pipeline/config.py.
Running Tests
pytest tests/test_voter.py -v
All 7 tests run without GPU or model downloads.
Optional: Super-Resolution
EDSR 2× upscale (~143 MB model) significantly improves small text on diagram scans. Enable it:
- Run
python download_models.pyand chooseywhen prompted for EDSR - Set
USE_SUPER_RESOLVE=truein.env
Project Structure
ocr-pipeline/
├── ocr_pipeline/ ← Python package
│ ├── config.py ← All tunable parameters
│ ├── preprocessor.py ← Stage 0
│ ├── engines.py ← Stage 1
│ ├── voter.py ← Stage 2
│ ├── corrector.py ← Stage 3
│ └── pipeline.py ← Orchestration
├── tests/
│ └── test_voter.py ← Unit tests (no GPU)
├── models/ ← Downloaded model files
├── setup_venv.bat ← Windows setup
└── download_models.py ← Model downloader
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trivision_ocr-1.0.1.tar.gz.
File metadata
- Download URL: trivision_ocr-1.0.1.tar.gz
- Upload date:
- Size: 31.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33b3253f2578a4ea791f568493c0816da73056a0f656a3797db749ac86902d3d
|
|
| MD5 |
f923032aa5847e56c987facd001b30d3
|
|
| BLAKE2b-256 |
aa0a79c078a52b867776275bb87eba8c573368d26726ede10e59b2d1f69e03bd
|
File details
Details for the file trivision_ocr-1.0.1-py3-none-any.whl.
File metadata
- Download URL: trivision_ocr-1.0.1-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66904e984ffe52610647d900f8e92388464549f712a476e53582be48ab76faee
|
|
| MD5 |
8623f2c98b7f64ca8f63b45a4877f189
|
|
| BLAKE2b-256 |
16d0203846d75b94fc043f90b692b2ad84320b285366141663b308eaee41c0bd
|