uniocr

Unified multilingual OCR tool based on PaddleOCR-VL and Apple Vision.

Project description

UniOCR Logo

One API. Multiple engines. Zero friction.

UniOCR is a unified, multilingual OCR abstraction layer that wraps best-in-class OCR engines behind a single, clean interface. Throw any image or PDF at it — get back structured text, Markdown, and layout blocks — regardless of which engine runs under the hood.

Built for developers, AI agents, and automation pipelines (n8n, Dify, Telegram bots, etc.).

✨ Highlights

🔌 Pluggable engines — PaddleOCR-VL (deep document AI) and Apple Vision (native macOS) with automatic priority fallback
⚡ Zero-config acceleration — Auto-detects Apple Silicon → launches MLX-VLM → Neural Engine speedup. No manual setup.
📄 Accepts Anything — File paths, URLs, Base64, multi-page PDFs (auto-flattened).
📦 Unified Output — Supports .text, .markdown, .json, and Searchable Dual-Layer PDFs.
🌐 Built-in REST API — FastAPI powered, Swagger docs, batch processing — directly consumable by n8n / Dify / any HTTP client.
🐳 Docker ready — single command deployment via Docker Compose
🖥️ CLI — uniocr extract, uniocr engines, uniocr serve

🏗️ Architecture

┌──────────────────────────────────────────────────────────────┐
│                    User Interface Layer                       │
│              SDK  ·  CLI  ·  REST API                        │
├──────────────────────────────────────────────────────────────┤
│                    Input Processor                           │
│         URL → File  ·  PDF → Images  ·  Base64 → File       │
├──────────────────────────────────────────────────────────────┤
│                Engine Dispatcher (auto)                      │
│         PaddleOCR-VL → Apple Vision → fallback               │
├─────────────────────┬────────────────────────────────────────┤
│   PaddleOCR-VL      │        Apple Vision                    │
│   + MLX-VLM         │        (native macOS)                  │
│   (auto-accelerated)│                                        │
├─────────────────────┴────────────────────────────────────────┤
│                 Standardised Output                          │
│          Document → Pages → Blocks                           │
│          .text  ·  .markdown  ·  .to_dict()                  │
└──────────────────────────────────────────────────────────────┘

🚀 Quick Start

Option 1: pip install

# Core only (lightweight, includes PDF flattening)
pip install uniocr

# With PaddleOCR-VL (powerful document AI, ~1.8 GB model download on first run)
pip install "uniocr[paddle]"

# With Apple Vision (macOS only, uses built-in system OCR)
pip install "uniocr[apple]"

# With REST API server
pip install "uniocr[api]"

# Everything
pip install "uniocr[all]"

Option 2: Docker (recommended for servers)

# Quick start — pull and run in detached mode
docker run -d --name uniocr -p 8000:8000 ghcr.io/yuanweize/uni-ocr:latest

# Or use Docker Compose (recommended)
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml
docker compose up -d

# Check it's running
curl http://localhost:8000/health

📖 Usage

Python SDK

from uniocr import UniOCR

ocr = UniOCR(engine="auto")          # Auto-selects best available engine
doc = ocr.extract("invoice.pdf")

print(doc.text)                       # Plain text
print(doc.markdown)                   # Structured Markdown
print(doc.to_dict())                  # JSON-serialisable dict

# Access individual blocks with layout info
for page in doc.pages:
    for block in page.blocks:
        print(f"[{block.block_type}] {block.text}")
        print(f"  bbox: {block.bbox}, confidence: {block.confidence}")

CLI

# List available engines
uniocr engines
# Output:
#   Available engines:
#     • paddle
#     • apple

# Extract text (outputs Markdown by default)
uniocr extract document.pdf -o result.md

# Generate a Searchable PDF (automatically triggered by .pdf extension)
uniocr extract input_image.jpg -o output_searchable.pdf

# Specify engine and output format
uniocr extract scan.png --engine apple --format json -o result.json

# Extract from a URL
uniocr extract "https://example.com/receipt.png" --format text

# Start the API server (single worker)
uniocr serve --port 8000

# Production: multiple workers
uniocr serve --port 8000 --workers 4

REST API

Start the server:

uniocr serve --port 8000
# Or via Docker:
docker compose up -d

Endpoints

Method	Endpoint	Description
`GET`	`/health`	Health check & engine list
`GET`	`/engines`	List available OCR engines
`GET`	`/docs`	Interactive Swagger docs
`POST`	`/extract`	Extract text from uploaded file (JSON/Markdown)
`POST`	`/extract/pdf`	Extract text and return a Searchable PDF file
`POST`	`/extract/url`	Extract text from URL
`POST`	`/extract/batch`	Process multiple files

Examples

# Health check
curl http://localhost:8000/health
# → {"status":"ok","version":"0.2.2","engines":["paddle","apple"]}

# Upload a file
curl -X POST http://localhost:8000/extract \
  -F "file=@invoice.pdf" \
  -F "engine=auto"

# Extract via URL
curl -X POST http://localhost:8000/extract/url \
  -F "url=https://example.com/image.png"

# Return a Searchable PDF directly
curl -X POST http://localhost:8000/extract/pdf \
  -F "file=@scan.png" -o searchable.pdf

# Batch processing
curl -X POST http://localhost:8000/extract/batch \
  -F "files=@page1.png" -F "files=@page2.png" \
  -F "engine=auto"

Response format

{
  "request_id": "ab07767c-331f-4f26-be01-2fcb75d36149",
  "engine": "PaddleOCRVLAdapter",
  "page_count": 1,
  "text": "Invoice #12345\nTotal: €1,234.56",
  "markdown": "# Invoice #12345\n\nTotal: €1,234.56",
  "pages": [
    {
      "page_number": 1,
      "text": "...",
      "markdown": "...",
      "blocks": [
        {
          "block_type": "text",
          "text": "Invoice #12345",
          "bbox": [0.05, 0.02, 0.45, 0.06],
          "confidence": 0.98
        }
      ]
    }
  ],
  "elapsed_seconds": 2.35
}

🐳 Docker

Quick Run

# Run in background (detached mode)
docker run -d \
  --name uniocr \
  -p 8000:8000 \
  -v uniocr-models:/root/.paddlex \
  ghcr.io/yuanweize/uni-ocr:latest

Docker Compose (recommended)

# Download compose file
curl -O https://raw.githubusercontent.com/yuanweize/uni-ocr/master/docker-compose.yml

# Start in background
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Build locally

git clone https://github.com/yuanweize/uni-ocr.git
cd uni-ocr
docker compose up -d --build

🔧 Engine Priority

When engine="auto", UniOCR selects the best available engine:

Priority	Engine	Best for	Speed
1	PaddleOCR-VL + MLX-VLM	Complex layouts, tables, formulas, 109 languages	⚡⚡
2	PaddleOCR-VL (CPU)	Same capabilities, without MLX acceleration	⚡
3	Apple Vision	Simple text, macOS only, instant	⚡⚡⚡

Apple Silicon users: when mlx-vlm is installed, UniOCR automatically starts an MLX-VLM server for Neural Engine acceleration. No configuration needed. The server is cleaned up on exit.

🔗 Integration Examples

UniOCR is designed to be called by automation tools and AI agents.

n8n Workflow

Use the HTTP Request node to call UniOCR:

Telegram Trigger → HTTP Request (UniOCR /extract) → AI Agent → ERPNext API

Configuration:

Method: POST
URL: http://uniocr:8000/extract
Body: Form-Data, file = {{ $binary.data }}

Dify Tool

Add UniOCR as a custom tool in Dify with the OpenAPI spec at /docs.

Bob (macOS OCR Plugin)

UniOCR can serve as the OCR backend for Bob:

# Start UniOCR on the default port
uniocr serve --port 8000
# Bob → Preferences → OCR → Custom API → http://localhost:8000/extract

Shell / Scripts

# Quick OCR from clipboard image (macOS)
pbpaste | base64 | curl -s -X POST http://localhost:8000/extract \
  -F "file=@-;filename=clipboard.png" | jq .text

⚙️ Configuration

UniOCR works out of the box with zero configuration. For advanced use cases:

Environment Variable	Description	Default
`UNIOCR_PORT`	API server port (Docker Compose)	`8000`
`UNIOCR_MLX_VLM_URL`	Override MLX-VLM server URL	Auto-detected
`UNIOCR_MLX_VLM_MODEL`	MLX-VLM model identifier	`PaddlePaddle/PaddleOCR-VL-1.6`

Copy .env.example to .env to customise:

cp .env.example .env

📁 Project Structure

uni-ocr/
├── src/uniocr/
│   ├── __init__.py          # UniOCR main class & public API
│   ├── models.py            # Document / Page / Block dataclasses
│   ├── cli.py               # CLI: extract · engines · serve
│   ├── api.py               # FastAPI REST service
│   ├── engines/
│   │   ├── __init__.py      # Engine registry & auto-dispatcher
│   │   ├── base.py          # BaseOCREngine ABC
│   │   ├── apple_vision.py  # macOS Vision adapter
│   │   └── paddle.py        # PaddleOCR-VL + MLX-VLM adapter
│   └── processors/
│       └── input.py         # URL / Base64 / PDF normalisation
├── assets/
│   └── logo.svg             # Project logo
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── pyproject.toml
├── CLAUDE.md                # Development guidelines
├── LICENSE                  # MIT
├── README.md                # English docs (this file)
└── README_zh.md             # 中文文档

🤝 Contributing

Contributions are welcome! Please open an issue or pull request.

Fork the repo
Create a feature branch (git checkout -b feat/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feat/amazing-feature)
Open a Pull Request

📄 License

Project details

Release history Release notifications | RSS feed

This version

0.2.3

Jun 8, 2026

0.2.2

Jun 8, 2026

0.2.1

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniocr-0.2.3.tar.gz (17.7 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

uniocr-0.2.3-py3-none-any.whl (23.0 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file uniocr-0.2.3.tar.gz.

File metadata

Download URL: uniocr-0.2.3.tar.gz
Upload date: Jun 8, 2026
Size: 17.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`af0ede51afbda5b426a3f1e53c0ba0bb4d1f36604301cdf59d91e5896bf40dea`
MD5	`49afb80c9e894e5ddb7f8a41c311a1c4`
BLAKE2b-256	`a6b0f89bdf2e60cfcd36e44d0163e7a2d5b73372786c4b662786b194e066339a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.3.tar.gz:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uniocr-0.2.3.tar.gz
- Subject digest: af0ede51afbda5b426a3f1e53c0ba0bb4d1f36604301cdf59d91e5896bf40dea
- Sigstore transparency entry: 1758648324
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: yuanweize/uni-ocr@f7585c35da17bfdb3cc215d323e56c5895e8f65c
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/yuanweize
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f7585c35da17bfdb3cc215d323e56c5895e8f65c
- Trigger Event: release

File details

Details for the file uniocr-0.2.3-py3-none-any.whl.

File metadata

Download URL: uniocr-0.2.3-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 23.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uniocr-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec1f3f890c46014a0fdb8b09556ec50a9e1eb9601f306198c2bddd71f51fab70`
MD5	`89b53159bb93f4ae7bf4bca4c9971268`
BLAKE2b-256	`05b2ee7f1179e4498539c395bbbb9d773c54f170d69f79f5faca10fd7b3ec990`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniocr-0.2.3-py3-none-any.whl:

Publisher: publish.yml on yuanweize/uni-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uniocr-0.2.3-py3-none-any.whl
- Subject digest: ec1f3f890c46014a0fdb8b09556ec50a9e1eb9601f306198c2bddd71f51fab70
- Sigstore transparency entry: 1758648337
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: yuanweize/uni-ocr@f7585c35da17bfdb3cc215d323e56c5895e8f65c
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/yuanweize
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f7585c35da17bfdb3cc215d323e56c5895e8f65c
- Trigger Event: release

uniocr 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

✨ Highlights

🏗️ Architecture

🚀 Quick Start

Option 1: pip install

Option 2: Docker (recommended for servers)

📖 Usage

Python SDK

CLI

REST API

Endpoints

Examples

Response format

🐳 Docker

Quick Run

Docker Compose (recommended)

Build locally

🔧 Engine Priority

🔗 Integration Examples

n8n Workflow

Dify Tool

Bob (macOS OCR Plugin)

Shell / Scripts

⚙️ Configuration

📁 Project Structure

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance