Skip to main content

UniOCR - Unified multilingual OCR abstraction layer with pluggable engine support.

Project description

UniOCR Logo

One API. Multiple engines. Zero friction.

CI PyPI Docker License 中文文档


UniOCR is a unified, multilingual OCR abstraction layer that wraps best-in-class OCR engines behind a single, clean interface. Throw any image or PDF at it — get back structured text, Markdown, and layout blocks — regardless of which engine runs under the hood.

Built for developers, AI agents, and automation pipelines (n8n, Dify, Telegram bots, etc.).

✨ Highlights

  • 🔌 Pluggable engines — PaddleOCR-VL (deep document AI) and Apple Vision (native macOS) with automatic priority fallback
  • Zero-config acceleration — Auto-detects Apple Silicon → launches MLX-VLM → Neural Engine speedup. No manual setup.
  • 📄 Accepts anything — file paths, URLs, Base64 data URIs, multi-page PDFs (auto-flattened)
  • 📦 Unified outputDocument → Pages → Blocks with .text, .markdown, .to_dict()
  • 🌐 REST API — FastAPI with Swagger docs, batch processing, request tracking — ready for n8n / Dify / any HTTP client
  • 🐳 Docker ready — single command deployment via Docker Compose
  • 🖥️ CLIuniocr extract, uniocr engines, uniocr serve

🏗️ Architecture

┌──────────────────────────────────────────────────────────────┐
│                    User Interface Layer                       │
│              SDK  ·  CLI  ·  REST API                        │
├──────────────────────────────────────────────────────────────┤
│                    Input Processor                           │
│         URL → File  ·  PDF → Images  ·  Base64 → File       │
├──────────────────────────────────────────────────────────────┤
│                Engine Dispatcher (auto)                      │
│         PaddleOCR-VL → Apple Vision → fallback               │
├─────────────────────┬────────────────────────────────────────┤
│   PaddleOCR-VL      │        Apple Vision                    │
│   + MLX-VLM         │        (native macOS)                  │
│   (auto-accelerated)│                                        │
├─────────────────────┴────────────────────────────────────────┤
│                 Standardised Output                          │
│          Document → Pages → Blocks                           │
│          .text  ·  .markdown  ·  .to_dict()                  │
└──────────────────────────────────────────────────────────────┘

🚀 Quick Start

Option 1: pip install

# Core only (lightweight, includes PDF flattening)
pip install uniocr

# With PaddleOCR-VL (powerful document AI, ~1.8 GB model download on first run)
pip install "uniocr[paddle]"

# With Apple Vision (macOS only, uses built-in system OCR)
pip install "uniocr[apple]"

# With REST API server
pip install "uniocr[api]"

# Everything
pip install "uniocr[all]"

Option 2: Docker (recommended for servers)

# Quick start — pull and run in detached mode
docker run -d --name uniocr -p 8000:8000 ghcr.io/yuanweize/uni-ocr:latest

# Or use Docker Compose (recommended)
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml
docker compose up -d

# Check it's running
curl http://localhost:8000/health

📖 Usage

Python SDK

from uniocr import UniOCR

ocr = UniOCR(engine="auto")          # Auto-selects best available engine
doc = ocr.extract("invoice.pdf")

print(doc.text)                       # Plain text
print(doc.markdown)                   # Structured Markdown
print(doc.to_dict())                  # JSON-serialisable dict

# Access individual blocks with layout info
for page in doc.pages:
    for block in page.blocks:
        print(f"[{block.block_type}] {block.text}")
        print(f"  bbox: {block.bbox}, confidence: {block.confidence}")

CLI

# List available engines
uniocr engines
# Output:
#   Available engines:
#     • paddle
#     • apple

# Extract text (default: Markdown output to stdout)
uniocr extract document.pdf

# Specify engine, format, and output file
uniocr extract scan.png --engine apple --format json -o result.json

# Extract from a URL
uniocr extract "https://example.com/receipt.png" --format text

# Start the API server (single worker)
uniocr serve --port 8000

# Production: multiple workers
uniocr serve --port 8000 --workers 4

REST API

Start the server:

uniocr serve --port 8000
# Or via Docker:
docker compose up -d

Endpoints

Method Endpoint Description
GET /health Health check (includes engine list)
GET /engines List available OCR engines
GET /docs Interactive Swagger UI
POST /extract Extract from uploaded file
POST /extract/url Extract from a public URL
POST /extract/batch Batch process multiple files

Examples

# Health check
curl http://localhost:8000/health
# → {"status":"ok","version":"0.2.1","engines":["paddle","apple"]}

# Upload a file
curl -X POST http://localhost:8000/extract \
  -F "file=@invoice.pdf" \
  -F "engine=auto"

# Extract from URL
curl -X POST http://localhost:8000/extract/url \
  -F "url=https://example.com/scan.png" \
  -F "engine=apple"

# Batch processing (multiple files in one request)
curl -X POST http://localhost:8000/extract/batch \
  -F "files=@page1.png" \
  -F "files=@page2.png" \
  -F "engine=auto"

Response format

{
  "request_id": "ab07767c-331f-4f26-be01-2fcb75d36149",
  "engine": "PaddleOCRVLAdapter",
  "page_count": 1,
  "text": "Invoice #12345\nTotal: €1,234.56",
  "markdown": "# Invoice #12345\n\nTotal: €1,234.56",
  "pages": [
    {
      "page_number": 1,
      "text": "...",
      "markdown": "...",
      "blocks": [
        {
          "block_type": "text",
          "text": "Invoice #12345",
          "bbox": [0.05, 0.02, 0.45, 0.06],
          "confidence": 0.98
        }
      ]
    }
  ],
  "elapsed_seconds": 2.35
}

🐳 Docker

Quick Run

# Run in background (detached mode)
docker run -d \
  --name uniocr \
  -p 8000:8000 \
  -v uniocr-models:/root/.paddlex \
  ghcr.io/yuanweize/uni-ocr:latest

Docker Compose (recommended)

# Download compose file
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml

# Start in background
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Build locally

git clone https://github.com/yuanweize/uni-ocr.git
cd uni-ocr
docker compose up -d --build

🔧 Engine Priority

When engine="auto", UniOCR selects the best available engine:

Priority Engine Best for Speed
1 PaddleOCR-VL + MLX-VLM Complex layouts, tables, formulas, 109 languages ⚡⚡
2 PaddleOCR-VL (CPU) Same capabilities, without MLX acceleration
3 Apple Vision Simple text, macOS only, instant ⚡⚡⚡

Apple Silicon users: when mlx-vlm is installed, UniOCR automatically starts an MLX-VLM server for Neural Engine acceleration. No configuration needed. The server is cleaned up on exit.

🔗 Integration Examples

UniOCR is designed to be called by automation tools and AI agents.

n8n Workflow

Use the HTTP Request node to call UniOCR:

Telegram Trigger → HTTP Request (UniOCR /extract) → AI Agent → ERPNext API

Configuration:

  • Method: POST
  • URL: http://uniocr:8000/extract
  • Body: Form-Data, file = {{ $binary.data }}

Dify Tool

Add UniOCR as a custom tool in Dify with the OpenAPI spec at /docs.

Bob (macOS OCR Plugin)

UniOCR can serve as the OCR backend for Bob:

# Start UniOCR on the default port
uniocr serve --port 8000
# Bob → Preferences → OCR → Custom API → http://localhost:8000/extract

Shell / Scripts

# Quick OCR from clipboard image (macOS)
pbpaste | base64 | curl -s -X POST http://localhost:8000/extract \
  -F "file=@-;filename=clipboard.png" | jq .text

⚙️ Configuration

UniOCR works out of the box with zero configuration. For advanced use cases:

Environment Variable Description Default
UNIOCR_PORT API server port (Docker Compose) 8000
UNIOCR_MLX_VLM_URL Override MLX-VLM server URL Auto-detected
UNIOCR_MLX_VLM_MODEL MLX-VLM model identifier PaddlePaddle/PaddleOCR-VL-1.6

Copy .env.example to .env to customise:

cp .env.example .env

📁 Project Structure

uni-ocr/
├── src/uniocr/
│   ├── __init__.py          # UniOCR main class & public API
│   ├── models.py            # Document / Page / Block dataclasses
│   ├── cli.py               # CLI: extract · engines · serve
│   ├── api.py               # FastAPI REST service
│   ├── engines/
│   │   ├── __init__.py      # Engine registry & auto-dispatcher
│   │   ├── base.py          # BaseOCREngine ABC
│   │   ├── apple_vision.py  # macOS Vision adapter
│   │   └── paddle.py        # PaddleOCR-VL + MLX-VLM adapter
│   └── processors/
│       └── input.py         # URL / Base64 / PDF normalisation
├── assets/
│   └── logo.svg             # Project logo
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── pyproject.toml
├── CLAUDE.md                # Development guidelines
├── LICENSE                  # MIT
├── README.md                # English docs (this file)
└── README_zh.md             # 中文文档

🤝 Contributing

Contributions are welcome! Please open an issue or pull request.

  1. Fork the repo
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feat/amazing-feature)
  5. Open a Pull Request

📄 License

MIT © 2026 Weize Yuan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniocr-0.2.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uniocr-0.2.1-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file uniocr-0.2.1.tar.gz.

File metadata

  • Download URL: uniocr-0.2.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9d7d73dd7474d73e9d6cf40b29b52b48aee50c71e05aa576262c2ea5542a7529
MD5 f5a87b62c97ae05cf2068db7847774a4
BLAKE2b-256 3c76b1a60db361ffc3061e7185e9c6e2dc07ea4158ddfc1490d5e7c5203c1595

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.1.tar.gz:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uniocr-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: uniocr-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ee64681be8196cb57f202a26248c9b96f6d84b2ed5798499b866e649b1a8eeb
MD5 beebc38710988ad0b13de2861d05574f
BLAKE2b-256 adf3f0e2644f8915468e341cc3aad2d8248db853e78cf90ea9e31f5bdc4d8e00

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.1-py3-none-any.whl:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page