Skip to main content

Unified multilingual OCR tool based on PaddleOCR-VL and Apple Vision.

Project description

UniOCR Logo

One API. Multiple engines. Zero friction.

CI PyPI Docker License 中文文档


UniOCR is a unified, multilingual OCR abstraction layer that wraps best-in-class OCR engines behind a single, clean interface. Throw any image or PDF at it — get back structured text, Markdown, and layout blocks — regardless of which engine runs under the hood.

Built for developers, AI agents, and automation pipelines (n8n, Dify, Telegram bots, etc.).

✨ Highlights

  • 🔌 Pluggable engines — PaddleOCR-VL (deep document AI) and Apple Vision (native macOS) with automatic priority fallback
  • Zero-config acceleration — Auto-detects Apple Silicon → launches MLX-VLM → Neural Engine speedup. No manual setup.
  • 📄 Accepts Anything — File paths, URLs, Base64, multi-page PDFs (auto-flattened).
  • 📦 Unified Output — Supports .text, .markdown, .json, and Searchable Dual-Layer PDFs.
  • 🌐 Built-in REST API — FastAPI powered, Swagger docs, batch processing — directly consumable by n8n / Dify / any HTTP client.
  • 🐳 Docker ready — single command deployment via Docker Compose
  • 🖥️ CLIuniocr extract, uniocr engines, uniocr serve

🏗️ Architecture

┌──────────────────────────────────────────────────────────────┐
│                    User Interface Layer                       │
│              SDK  ·  CLI  ·  REST API                        │
├──────────────────────────────────────────────────────────────┤
│                    Input Processor                           │
│         URL → File  ·  PDF → Images  ·  Base64 → File       │
├──────────────────────────────────────────────────────────────┤
│                Engine Dispatcher (auto)                      │
│         PaddleOCR-VL → Apple Vision → fallback               │
├─────────────────────┬────────────────────────────────────────┤
│   PaddleOCR-VL      │        Apple Vision                    │
│   + MLX-VLM         │        (native macOS)                  │
│   (auto-accelerated)│                                        │
├─────────────────────┴────────────────────────────────────────┤
│                 Standardised Output                          │
│          Document → Pages → Blocks                           │
│          .text  ·  .markdown  ·  .to_dict()                  │
└──────────────────────────────────────────────────────────────┘

🚀 Quick Start

Option 1: pip install

# Core only (lightweight, includes PDF flattening)
pip install uniocr

# With PaddleOCR-VL (powerful document AI, ~1.8 GB model download on first run)
pip install "uniocr[paddle]"

# With Apple Vision (macOS only, uses built-in system OCR)
pip install "uniocr[apple]"

# With REST API server
pip install "uniocr[api]"

# Everything
pip install "uniocr[all]"

Option 2: Docker (recommended for servers)

# Quick start — pull and run in detached mode
docker run -d --name uniocr -p 8000:8000 ghcr.io/yuanweize/uni-ocr:latest

# Or use Docker Compose (recommended)
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml
docker compose up -d

# Check it's running
curl http://localhost:8000/health

📖 Usage

Python SDK

from uniocr import UniOCR

ocr = UniOCR(engine="auto")          # Auto-selects best available engine
doc = ocr.extract("invoice.pdf")

print(doc.text)                       # Plain text
print(doc.markdown)                   # Structured Markdown
print(doc.to_dict())                  # JSON-serialisable dict

# Access individual blocks with layout info
for page in doc.pages:
    for block in page.blocks:
        print(f"[{block.block_type}] {block.text}")
        print(f"  bbox: {block.bbox}, confidence: {block.confidence}")

CLI

# List available engines
uniocr engines
# Output:
#   Available engines:
#     • paddle
#     • apple

# Extract text (outputs Markdown by default)
uniocr extract document.pdf -o result.md

# Generate a Searchable PDF (automatically triggered by .pdf extension)
uniocr extract input_image.jpg -o output_searchable.pdf

# Specify engine and output format
uniocr extract scan.png --engine apple --format json -o result.json

# Extract from a URL
uniocr extract "https://example.com/receipt.png" --format text

# Start the API server (single worker)
uniocr serve --port 8000

# Production: multiple workers
uniocr serve --port 8000 --workers 4

REST API

Start the server:

uniocr serve --port 8000
# Or via Docker:
docker compose up -d

Endpoints

Method Endpoint Description
GET /health Health check & engine list
GET /engines List available OCR engines
GET /docs Interactive Swagger docs
POST /extract Extract text from uploaded file (JSON/Markdown)
POST /extract/pdf Extract text and return a Searchable PDF file
POST /extract/url Extract text from URL
POST /extract/batch Process multiple files

Examples

# Health check
curl http://localhost:8000/health
# → {"status":"ok","version":"0.2.2","engines":["paddle","apple"]}

# Upload a file
curl -X POST http://localhost:8000/extract \
  -F "file=@invoice.pdf" \
  -F "engine=auto"

# Extract via URL
curl -X POST http://localhost:8000/extract/url \
  -F "url=https://example.com/image.png"

# Return a Searchable PDF directly
curl -X POST http://localhost:8000/extract/pdf \
  -F "file=@scan.png" -o searchable.pdf

# Batch processing
curl -X POST http://localhost:8000/extract/batch \
  -F "files=@page1.png" -F "files=@page2.png" \
  -F "engine=auto"

Response format

{
  "request_id": "ab07767c-331f-4f26-be01-2fcb75d36149",
  "engine": "PaddleOCRVLAdapter",
  "page_count": 1,
  "text": "Invoice #12345\nTotal: €1,234.56",
  "markdown": "# Invoice #12345\n\nTotal: €1,234.56",
  "pages": [
    {
      "page_number": 1,
      "text": "...",
      "markdown": "...",
      "blocks": [
        {
          "block_type": "text",
          "text": "Invoice #12345",
          "bbox": [0.05, 0.02, 0.45, 0.06],
          "confidence": 0.98
        }
      ]
    }
  ],
  "elapsed_seconds": 2.35
}

🐳 Docker

Quick Run

# Run in background (detached mode)
docker run -d \
  --name uniocr \
  -p 8000:8000 \
  -v uniocr-models:/root/.paddlex \
  ghcr.io/yuanweize/uni-ocr:latest

Docker Compose (recommended)

# Download compose file
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml

# Start in background
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Build locally

git clone https://github.com/yuanweize/uni-ocr.git
cd uni-ocr
docker compose up -d --build

🔧 Engine Priority

When engine="auto", UniOCR selects the best available engine:

Priority Engine Best for Speed
1 PaddleOCR-VL + MLX-VLM Complex layouts, tables, formulas, 109 languages ⚡⚡
2 PaddleOCR-VL (CPU) Same capabilities, without MLX acceleration
3 Apple Vision Simple text, macOS only, instant ⚡⚡⚡

Apple Silicon users: when mlx-vlm is installed, UniOCR automatically starts an MLX-VLM server for Neural Engine acceleration. No configuration needed. The server is cleaned up on exit.

🔗 Integration Examples

UniOCR is designed to be called by automation tools and AI agents.

n8n Workflow

Use the HTTP Request node to call UniOCR:

Telegram Trigger → HTTP Request (UniOCR /extract) → AI Agent → ERPNext API

Configuration:

  • Method: POST
  • URL: http://uniocr:8000/extract
  • Body: Form-Data, file = {{ $binary.data }}

Dify Tool

Add UniOCR as a custom tool in Dify with the OpenAPI spec at /docs.

Bob (macOS OCR Plugin)

UniOCR can serve as the OCR backend for Bob:

# Start UniOCR on the default port
uniocr serve --port 8000
# Bob → Preferences → OCR → Custom API → http://localhost:8000/extract

Shell / Scripts

# Quick OCR from clipboard image (macOS)
pbpaste | base64 | curl -s -X POST http://localhost:8000/extract \
  -F "file=@-;filename=clipboard.png" | jq .text

⚙️ Configuration

UniOCR works out of the box with zero configuration. For advanced use cases:

Environment Variable Description Default
UNIOCR_PORT API server port (Docker Compose) 8000
UNIOCR_MLX_VLM_URL Override MLX-VLM server URL Auto-detected
UNIOCR_MLX_VLM_MODEL MLX-VLM model identifier PaddlePaddle/PaddleOCR-VL-1.6

Copy .env.example to .env to customise:

cp .env.example .env

📁 Project Structure

uni-ocr/
├── src/uniocr/
│   ├── __init__.py          # UniOCR main class & public API
│   ├── models.py            # Document / Page / Block dataclasses
│   ├── cli.py               # CLI: extract · engines · serve
│   ├── api.py               # FastAPI REST service
│   ├── engines/
│   │   ├── __init__.py      # Engine registry & auto-dispatcher
│   │   ├── base.py          # BaseOCREngine ABC
│   │   ├── apple_vision.py  # macOS Vision adapter
│   │   └── paddle.py        # PaddleOCR-VL + MLX-VLM adapter
│   └── processors/
│       └── input.py         # URL / Base64 / PDF normalisation
├── assets/
│   └── logo.svg             # Project logo
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── pyproject.toml
├── CLAUDE.md                # Development guidelines
├── LICENSE                  # MIT
├── README.md                # English docs (this file)
└── README_zh.md             # 中文文档

🤝 Contributing

Contributions are welcome! Please open an issue or pull request.

  1. Fork the repo
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feat/amazing-feature)
  5. Open a Pull Request

📄 License

MIT © 2026 Weize Yuan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniocr-0.2.3.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uniocr-0.2.3-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file uniocr-0.2.3.tar.gz.

File metadata

  • Download URL: uniocr-0.2.3.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.3.tar.gz
Algorithm Hash digest
SHA256 af0ede51afbda5b426a3f1e53c0ba0bb4d1f36604301cdf59d91e5896bf40dea
MD5 49afb80c9e894e5ddb7f8a41c311a1c4
BLAKE2b-256 a6b0f89bdf2e60cfcd36e44d0163e7a2d5b73372786c4b662786b194e066339a

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.3.tar.gz:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uniocr-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: uniocr-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ec1f3f890c46014a0fdb8b09556ec50a9e1eb9601f306198c2bddd71f51fab70
MD5 89b53159bb93f4ae7bf4bca4c9971268
BLAKE2b-256 05b2ee7f1179e4498539c395bbbb9d773c54f170d69f79f5faca10fd7b3ec990

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.3-py3-none-any.whl:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page