PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yz3440

These details have not been verified by PyPI

Project description

PanoOCR

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

https://github.com/user-attachments/assets/57507c48-ec88-4d4a-bf68-067eefc9d42f

Features

Multiple OCR Engines: Support for MacOCR (Apple Vision), RapidOCR, EasyOCR, PaddleOCR, Florence-2, Google Cloud Vision, Gemini, and more
Automatic Perspective Projection: Converts equirectangular panoramas to multiple perspective views for better OCR accuracy
Deduplication: Automatically removes duplicate text detections across overlapping perspective views
Spherical Coordinates: Returns OCR results in yaw/pitch coordinates that map directly to the panorama
Preview Tool: Interactive 3D preview of OCR results on the panorama

Installation

Install the base package:

pip install panoocr

Install with OCR engine dependencies:

# macOS (Apple Vision Framework)
pip install "panoocr[macocr]"

# RapidOCR (PP-OCRv4/v5 via ONNX Runtime, cross-platform)
pip install "panoocr[rapidocr]"

# EasyOCR (cross-platform)
pip install "panoocr[easyocr]"

# PaddleOCR (cross-platform)
pip install "panoocr[paddleocr]"

# Florence-2 via transformers + torch (requires GPU recommended)
pip install "panoocr[florence2]"

# MLX VLM engines: Florence-2 MLX, GLM-OCR, DOTS.OCR (macOS Apple Silicon)
pip install "panoocr[mlx-vlm]"

# Google Cloud Vision API (requires API key)
pip install "panoocr[google-vision]"

# Gemini API (requires API key)
pip install "panoocr[gemini]"

# Cross-platform local engines + visualization
# (excludes macOS-only macocr, Apple Silicon mlx-vlm, cloud APIs, and experimental trocr)
pip install "panoocr[full]"

Using uv (recommended):

uv add panoocr
uv sync --extra macocr      # or other extras
uv sync --extra rapidocr
uv sync --extra mlx-vlm     # Florence-2 MLX, GLM-OCR, DOTS.OCR

Quick Start

from panoocr import PanoOCR
from panoocr.engines.macocr import MacOCREngine  # or other engines

# Create an OCR engine
engine = MacOCREngine()

# Create the PanoOCR pipeline
pano = PanoOCR(engine)

# Run OCR on a panorama
result = pano.recognize("panorama.jpg")

# Save results as JSON
result.save_json("results.json")

# Access individual results
for r in result.results:
    print(f"Text: {r.text}")
    print(f"Position: yaw={r.yaw}°, pitch={r.pitch}°")
    print(f"Confidence: {r.confidence}")

Available OCR Engines

Structured engines (return per-word bounding boxes)

MacOCREngine (macOS only)

Uses Apple's Vision Framework for fast, accurate OCR on macOS.

from panoocr.engines.macocr import MacOCREngine

engine = MacOCREngine()

RapidOCREngine

PaddleOCR PP-OCRv4/v5 models via ONNX Runtime. Supports both v4 (2023) and v5 (2025) models, multilingual including CJK.

from panoocr.engines.rapidocr_engine import RapidOCREngine

engine_v4 = RapidOCREngine()                                     # default: PP-OCRv4
engine_v5 = RapidOCREngine(config={"ocr_version": "PP-OCRv5"})   # PP-OCRv5

EasyOCREngine

Cross-platform OCR supporting 80+ languages.

from panoocr.engines.easyocr import EasyOCREngine

engine = EasyOCREngine(config={"language_preference": ["en"], "gpu": True})

PaddleOCREngine

PaddlePaddle-based OCR supporting multiple languages with automatic model management.

from panoocr.engines.paddleocr import PaddleOCREngine

engine = PaddleOCREngine()

GoogleVisionEngine

Google Cloud Vision API (TEXT_DETECTION). Requires GOOGLE_VISION_API_KEY in environment or .env.

from panoocr.engines.google_vision import GoogleVisionEngine

engine = GoogleVisionEngine()

Florence2OCREngine (transformers + torch)

Microsoft's Florence-2 vision-language model via transformers.

from panoocr.engines.florence2 import Florence2OCREngine

engine = Florence2OCREngine()

Florence2MLXEngine (mlx-vlm, macOS Apple Silicon)

Florence-2 via mlx-vlm with <OCR_WITH_REGION> for structured quad-box output. The only VLM engine that returns per-word bounding boxes.

from panoocr.engines.florence2_mlx import Florence2MLXEngine

engine = Florence2MLXEngine()

Unstructured engines (return text without bounding boxes)

These engines return text only. Each detection gets a full-image bounding box for crop-level attribution in the panoocr pipeline.

GeminiEngine

Google Gemini API. Supports multiple model variants. Requires GOOGLE_GEMINI_API_KEY in environment or .env.

from panoocr.engines.gemini import GeminiEngine

engine_flash = GeminiEngine(config={"model": "gemini-2.5-flash"})
engine_pro = GeminiEngine(config={"model": "gemini-2.5-pro"})

GlmOCREngine (mlx-vlm, macOS Apple Silicon)

GLM-OCR (0.9B) via mlx-vlm. Document-focused VLM -- limited effectiveness on scene text.

from panoocr.engines.glm_ocr import GlmOCREngine

engine = GlmOCREngine()

DotsOCREngine (mlx-vlm, macOS Apple Silicon)

DOTS.OCR (2.9B) via mlx-vlm. Document layout parser -- limited effectiveness on scene text.

from panoocr.engines.dots_ocr import DotsOCREngine

engine = DotsOCREngine()

TrOCREngine (experimental)

Microsoft's TrOCR transformer-based single-line OCR. Does not detect text regions -- treats the entire image as one text line. Experimental; consider other engines for panorama OCR.

from panoocr.engines.trocr import TrOCREngine

engine = TrOCREngine()

Advanced Usage

Custom Perspectives

from panoocr import PanoOCR, PerspectivePreset, generate_perspectives

# Use a preset
pano = PanoOCR(engine, perspectives=PerspectivePreset.ZOOMED_IN)

# Or create custom perspectives
custom_perspectives = generate_perspectives(
    fov=30,              # Horizontal FOV in degrees
    resolution=1024,     # Pixel width/height
    overlap=0.5,         # 50% overlap between adjacent views
    pitch_angles=[0, 15, -15],  # Multiple rows
)
pano = PanoOCR(engine, perspectives=custom_perspectives)

Multi-Scale Detection

from panoocr import PanoOCR, PerspectivePreset

pano = PanoOCR(engine)

# Run OCR at multiple scales to catch both small and large text
result = pano.recognize_multi(
    "panorama.jpg",
    presets=[
        PerspectivePreset.ZOOMED_IN,
        PerspectivePreset.DEFAULT,
    ],
)

Custom Deduplication Settings

from panoocr import PanoOCR, DedupOptions

pano = PanoOCR(
    engine,
    dedup_options=DedupOptions(
        min_text_similarity=0.6,
        min_intersection_ratio=0.2,
    ),
)

Using the Protocol for Custom Engines

You can create your own OCR engine by implementing the OCREngine protocol:

from panoocr import OCREngine, FlatOCRResult
from PIL import Image

class MyCustomEngine:
    def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
        # Your OCR implementation here
        # Return results with normalized bounding boxes (0-1 range)
        ...

# No inheritance required - just implement the method
engine = MyCustomEngine()
pano = PanoOCR(engine)

Preview Tool

The package includes an interactive HTML preview tool for visualizing OCR results on the panorama. Open preview/index.html in a browser and drag & drop your panorama image and JSON results file.

Output Format

OCR results are returned as SphereOCRResult objects with spherical coordinates:

{
  "results": [
    {
      "text": "HELLO WORLD",
      "confidence": 0.95,
      "yaw": 45.0,
      "pitch": 0.0,
      "width": 10.5,
      "height": 3.2,
      "engine": "APPLE_VISION_FRAMEWORK"
    }
  ],
  "image_path": "panorama.jpg",
  "perspective_preset": "default"
}

yaw: Horizontal angle in degrees (-180 to 180)
pitch: Vertical angle in degrees (-90 to 90)
width, height: Angular dimensions in degrees

License

MIT License - see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yz3440

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.5.0

Mar 31, 2026

0.4.1

Feb 12, 2026

0.4.0

Feb 12, 2026

0.3.1

Jan 27, 2026

0.3.0

Jan 27, 2026

0.2.1

Jan 27, 2026

0.2.0

Jan 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panoocr-0.5.0.tar.gz (14.1 MB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

panoocr-0.5.0-py3-none-any.whl (51.4 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file panoocr-0.5.0.tar.gz.

File metadata

Download URL: panoocr-0.5.0.tar.gz
Upload date: Mar 31, 2026
Size: 14.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for panoocr-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`8949541ae6a4ceef35c62444046632a1a9723cffce571a53f6e80575ce079340`
MD5	`484bf538d1defa4c0cb4ff7fcc112701`
BLAKE2b-256	`1e20ca0bd010afba136066465dd09f2798bb538041945d7e3c783dc21b3caba9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for panoocr-0.5.0.tar.gz:

Publisher: publish.yml on yz3440/panoocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: panoocr-0.5.0.tar.gz
- Subject digest: 8949541ae6a4ceef35c62444046632a1a9723cffce571a53f6e80575ce079340
- Sigstore transparency entry: 1203495091
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: yz3440/panoocr@fb515183c8fccdb4c062dc0d201d2a3ea3f82ec1
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/yz3440
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fb515183c8fccdb4c062dc0d201d2a3ea3f82ec1
- Trigger Event: release

File details

Details for the file panoocr-0.5.0-py3-none-any.whl.

File metadata

Download URL: panoocr-0.5.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 51.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for panoocr-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b3bda9e7b05630d103973f129ae629108a74661c5e12719f6918cf28d701430`
MD5	`0e0fa5effdd3703f5e6142165dab7504`
BLAKE2b-256	`5850e07f9142878940f2fb8a88cf440adbc4830c8e89b946aaa433054efdd79f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for panoocr-0.5.0-py3-none-any.whl:

Publisher: publish.yml on yz3440/panoocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: panoocr-0.5.0-py3-none-any.whl
- Subject digest: 0b3bda9e7b05630d103973f129ae629108a74661c5e12719f6918cf28d701430
- Sigstore transparency entry: 1203495094
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: yz3440/panoocr@fb515183c8fccdb4c062dc0d201d2a3ea3f82ec1
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/yz3440
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fb515183c8fccdb4c062dc0d201d2a3ea3f82ec1
- Trigger Event: release

panoocr 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PanoOCR

Features

Installation

Quick Start

Available OCR Engines

Structured engines (return per-word bounding boxes)

MacOCREngine (macOS only)

RapidOCREngine

EasyOCREngine

PaddleOCREngine

GoogleVisionEngine

Florence2OCREngine (transformers + torch)

Florence2MLXEngine (mlx-vlm, macOS Apple Silicon)

Unstructured engines (return text without bounding boxes)

GeminiEngine

GlmOCREngine (mlx-vlm, macOS Apple Silicon)

DotsOCREngine (mlx-vlm, macOS Apple Silicon)

TrOCREngine (experimental)

Advanced Usage

Custom Perspectives

Multi-Scale Detection

Custom Deduplication Settings

Using the Protocol for Custom Engines

Preview Tool

Output Format

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance