UniOCR - Unified multilingual OCR abstraction layer with pluggable engine support.
Project description
One API. Multiple engines. Zero friction.
UniOCR is a unified, multilingual OCR abstraction layer that wraps best-in-class OCR engines behind a single, clean interface. Throw any image or PDF at it — get back structured text, Markdown, and layout blocks — regardless of which engine runs under the hood.
Built for developers, AI agents, and automation pipelines (n8n, Dify, Telegram bots, etc.).
✨ Highlights
- 🔌 Pluggable engines — PaddleOCR-VL (deep document AI) and Apple Vision (native macOS) with automatic priority fallback
- ⚡ Zero-config acceleration — Auto-detects Apple Silicon → launches MLX-VLM → Neural Engine speedup. No manual setup.
- 📄 Accepts anything — file paths, URLs, Base64 data URIs, multi-page PDFs (auto-flattened)
- 📦 Unified output —
Document → Pages → Blockswith.text,.markdown,.to_dict() - 🌐 REST API — FastAPI with Swagger docs, batch processing, request tracking — ready for n8n / Dify / any HTTP client
- 🐳 Docker ready — single command deployment via Docker Compose
- 🖥️ CLI —
uniocr extract,uniocr engines,uniocr serve
🏗️ Architecture
┌──────────────────────────────────────────────────────────────┐
│ User Interface Layer │
│ SDK · CLI · REST API │
├──────────────────────────────────────────────────────────────┤
│ Input Processor │
│ URL → File · PDF → Images · Base64 → File │
├──────────────────────────────────────────────────────────────┤
│ Engine Dispatcher (auto) │
│ PaddleOCR-VL → Apple Vision → fallback │
├─────────────────────┬────────────────────────────────────────┤
│ PaddleOCR-VL │ Apple Vision │
│ + MLX-VLM │ (native macOS) │
│ (auto-accelerated)│ │
├─────────────────────┴────────────────────────────────────────┤
│ Standardised Output │
│ Document → Pages → Blocks │
│ .text · .markdown · .to_dict() │
└──────────────────────────────────────────────────────────────┘
🚀 Quick Start
Option 1: pip install
# Core only (lightweight, includes PDF flattening)
pip install uniocr
# With PaddleOCR-VL (powerful document AI, ~1.8 GB model download on first run)
pip install "uniocr[paddle]"
# With Apple Vision (macOS only, uses built-in system OCR)
pip install "uniocr[apple]"
# With REST API server
pip install "uniocr[api]"
# Everything
pip install "uniocr[all]"
Option 2: Docker (recommended for servers)
# Quick start — pull and run in detached mode
docker run -d --name uniocr -p 8000:8000 ghcr.io/yuanweize/uni-ocr:latest
# Or use Docker Compose (recommended)
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml
docker compose up -d
# Check it's running
curl http://localhost:8000/health
📖 Usage
Python SDK
from uniocr import UniOCR
ocr = UniOCR(engine="auto") # Auto-selects best available engine
doc = ocr.extract("invoice.pdf")
print(doc.text) # Plain text
print(doc.markdown) # Structured Markdown
print(doc.to_dict()) # JSON-serialisable dict
# Access individual blocks with layout info
for page in doc.pages:
for block in page.blocks:
print(f"[{block.block_type}] {block.text}")
print(f" bbox: {block.bbox}, confidence: {block.confidence}")
CLI
# List available engines
uniocr engines
# Output:
# Available engines:
# • paddle
# • apple
# Extract text (default: Markdown output to stdout)
uniocr extract document.pdf
# Specify engine, format, and output file
uniocr extract scan.png --engine apple --format json -o result.json
# Extract from a URL
uniocr extract "https://example.com/receipt.png" --format text
# Start the API server (single worker)
uniocr serve --port 8000
# Production: multiple workers
uniocr serve --port 8000 --workers 4
REST API
Start the server:
uniocr serve --port 8000
# Or via Docker:
docker compose up -d
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check (includes engine list) |
GET |
/engines |
List available OCR engines |
GET |
/docs |
Interactive Swagger UI |
POST |
/extract |
Extract from uploaded file |
POST |
/extract/url |
Extract from a public URL |
POST |
/extract/batch |
Batch process multiple files |
Examples
# Health check
curl http://localhost:8000/health
# → {"status":"ok","version":"0.2.2","engines":["paddle","apple"]}
# Upload a file
curl -X POST http://localhost:8000/extract \
-F "file=@invoice.pdf" \
-F "engine=auto"
# Extract from URL
curl -X POST http://localhost:8000/extract/url \
-F "url=https://example.com/scan.png" \
-F "engine=apple"
# Batch processing (multiple files in one request)
curl -X POST http://localhost:8000/extract/batch \
-F "files=@page1.png" \
-F "files=@page2.png" \
-F "engine=auto"
Response format
{
"request_id": "ab07767c-331f-4f26-be01-2fcb75d36149",
"engine": "PaddleOCRVLAdapter",
"page_count": 1,
"text": "Invoice #12345\nTotal: €1,234.56",
"markdown": "# Invoice #12345\n\nTotal: €1,234.56",
"pages": [
{
"page_number": 1,
"text": "...",
"markdown": "...",
"blocks": [
{
"block_type": "text",
"text": "Invoice #12345",
"bbox": [0.05, 0.02, 0.45, 0.06],
"confidence": 0.98
}
]
}
],
"elapsed_seconds": 2.35
}
🐳 Docker
Quick Run
# Run in background (detached mode)
docker run -d \
--name uniocr \
-p 8000:8000 \
-v uniocr-models:/root/.paddlex \
ghcr.io/yuanweize/uni-ocr:latest
Docker Compose (recommended)
# Download compose file
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml
# Start in background
docker compose up -d
# View logs
docker compose logs -f
# Stop
docker compose down
Build locally
git clone https://github.com/yuanweize/uni-ocr.git
cd uni-ocr
docker compose up -d --build
🔧 Engine Priority
When engine="auto", UniOCR selects the best available engine:
| Priority | Engine | Best for | Speed |
|---|---|---|---|
| 1 | PaddleOCR-VL + MLX-VLM | Complex layouts, tables, formulas, 109 languages | ⚡⚡ |
| 2 | PaddleOCR-VL (CPU) | Same capabilities, without MLX acceleration | ⚡ |
| 3 | Apple Vision | Simple text, macOS only, instant | ⚡⚡⚡ |
Apple Silicon users: when
mlx-vlmis installed, UniOCR automatically starts an MLX-VLM server for Neural Engine acceleration. No configuration needed. The server is cleaned up on exit.
🔗 Integration Examples
UniOCR is designed to be called by automation tools and AI agents.
n8n Workflow
Use the HTTP Request node to call UniOCR:
Telegram Trigger → HTTP Request (UniOCR /extract) → AI Agent → ERPNext API
Configuration:
- Method:
POST - URL:
http://uniocr:8000/extract - Body: Form-Data,
file={{ $binary.data }}
Dify Tool
Add UniOCR as a custom tool in Dify with the OpenAPI spec at /docs.
Bob (macOS OCR Plugin)
UniOCR can serve as the OCR backend for Bob:
# Start UniOCR on the default port
uniocr serve --port 8000
# Bob → Preferences → OCR → Custom API → http://localhost:8000/extract
Shell / Scripts
# Quick OCR from clipboard image (macOS)
pbpaste | base64 | curl -s -X POST http://localhost:8000/extract \
-F "file=@-;filename=clipboard.png" | jq .text
⚙️ Configuration
UniOCR works out of the box with zero configuration. For advanced use cases:
| Environment Variable | Description | Default |
|---|---|---|
UNIOCR_PORT |
API server port (Docker Compose) | 8000 |
UNIOCR_MLX_VLM_URL |
Override MLX-VLM server URL | Auto-detected |
UNIOCR_MLX_VLM_MODEL |
MLX-VLM model identifier | PaddlePaddle/PaddleOCR-VL-1.6 |
Copy .env.example to .env to customise:
cp .env.example .env
📁 Project Structure
uni-ocr/
├── src/uniocr/
│ ├── __init__.py # UniOCR main class & public API
│ ├── models.py # Document / Page / Block dataclasses
│ ├── cli.py # CLI: extract · engines · serve
│ ├── api.py # FastAPI REST service
│ ├── engines/
│ │ ├── __init__.py # Engine registry & auto-dispatcher
│ │ ├── base.py # BaseOCREngine ABC
│ │ ├── apple_vision.py # macOS Vision adapter
│ │ └── paddle.py # PaddleOCR-VL + MLX-VLM adapter
│ └── processors/
│ └── input.py # URL / Base64 / PDF normalisation
├── assets/
│ └── logo.svg # Project logo
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── pyproject.toml
├── CLAUDE.md # Development guidelines
├── LICENSE # MIT
├── README.md # English docs (this file)
└── README_zh.md # 中文文档
🤝 Contributing
Contributions are welcome! Please open an issue or pull request.
- Fork the repo
- Create a feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
📄 License
MIT © 2026 Weize Yuan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uniocr-0.2.2.tar.gz.
File metadata
- Download URL: uniocr-0.2.2.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1c56f6c2b477bc7c14779b3a997dd4302acfc9e109b5a83ab763b11ecf26d28
|
|
| MD5 |
2aedecd780c7c2eadc8f5bd2d2ee15c8
|
|
| BLAKE2b-256 |
6863b5a0dbcebfd3ded8d4e3fd4e1373f18ac5b7d52b709334dc8ddce940449a
|
Provenance
The following attestation bundles were made for uniocr-0.2.2.tar.gz:
Publisher:
publish.yml on yuanweize/uni-ocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uniocr-0.2.2.tar.gz -
Subject digest:
e1c56f6c2b477bc7c14779b3a997dd4302acfc9e109b5a83ab763b11ecf26d28 - Sigstore transparency entry: 1756302967
- Sigstore integration time:
-
Permalink:
yuanweize/uni-ocr@60e5c98ab2babb8e5fc2e75a41963c06e868b4f6 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/yuanweize
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@60e5c98ab2babb8e5fc2e75a41963c06e868b4f6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file uniocr-0.2.2-py3-none-any.whl.
File metadata
- Download URL: uniocr-0.2.2-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8640459e5e9ef44fc1a4239664eea343869354ca705205ca63f2e35a83878295
|
|
| MD5 |
b0578fd63f3b7322b8a10951094fa2a4
|
|
| BLAKE2b-256 |
98e9f5e4b0fa4a827dc75027abbe307004f0172737fa9eead5954f6763527d6c
|
Provenance
The following attestation bundles were made for uniocr-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on yuanweize/uni-ocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uniocr-0.2.2-py3-none-any.whl -
Subject digest:
8640459e5e9ef44fc1a4239664eea343869354ca705205ca63f2e35a83878295 - Sigstore transparency entry: 1756302993
- Sigstore integration time:
-
Permalink:
yuanweize/uni-ocr@60e5c98ab2babb8e5fc2e75a41963c06e868b4f6 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/yuanweize
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@60e5c98ab2babb8e5fc2e75a41963c06e868b4f6 -
Trigger Event:
release
-
Statement type: