Skip to main content

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

Project description

PanoOCR

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

https://github.com/user-attachments/assets/57507c48-ec88-4d4a-bf68-067eefc9d42f

Features

  • Multiple OCR Engines: Support for MacOCR (Apple Vision), EasyOCR, PaddleOCR, Florence-2, and TrOCR
  • Automatic Perspective Projection: Converts equirectangular panoramas to multiple perspective views for better OCR accuracy
  • Deduplication: Automatically removes duplicate text detections across overlapping perspective views
  • Spherical Coordinates: Returns OCR results in yaw/pitch coordinates that map directly to the panorama
  • Preview Tool: Interactive 3D preview of OCR results on the panorama

Installation

Install the base package:

pip install panoocr

Install with OCR engine dependencies:

# macOS (Apple Vision Framework)
pip install "panoocr[macocr]"

# EasyOCR (cross-platform)
pip install "panoocr[easyocr]"

# PaddleOCR (cross-platform)
pip install "panoocr[paddleocr]"

# Florence-2 (requires GPU recommended)
pip install "panoocr[florence2]"

# All engines (excluding platform-specific macocr)
pip install "panoocr[full]"

Using uv (recommended):

uv add panoocr
uv add "panoocr[macocr]"  # or other extras

Quick Start

from panoocr import PanoOCR
from panoocr.engines.macocr import MacOCREngine  # or other engines

# Create an OCR engine
engine = MacOCREngine()

# Create the PanoOCR pipeline
pano = PanoOCR(engine)

# Run OCR on a panorama
result = pano.recognize("panorama.jpg")

# Save results as JSON
result.save_json("results.json")

# Access individual results
for r in result.results:
    print(f"Text: {r.text}")
    print(f"Position: yaw={r.yaw}°, pitch={r.pitch}°")
    print(f"Confidence: {r.confidence}")

Available OCR Engines

MacOCREngine (macOS only)

Uses Apple's Vision Framework for fast, accurate OCR on macOS.

from panoocr.engines.macocr import MacOCREngine, MacOCRLanguageCode

engine = MacOCREngine(config={
    "language_preference": [MacOCRLanguageCode.ENGLISH_US],
})

EasyOCREngine

Cross-platform OCR supporting 80+ languages.

from panoocr.engines.easyocr import EasyOCREngine, EasyOCRLanguageCode

engine = EasyOCREngine(config={
    "language_preference": [EasyOCRLanguageCode.ENGLISH],
    "gpu": True,
})

PaddleOCREngine

PaddlePaddle-based OCR with optional V4 server model for Chinese text.

from panoocr.engines.paddleocr import PaddleOCREngine, PaddleOCRLanguageCode

engine = PaddleOCREngine(config={
    "language_preference": PaddleOCRLanguageCode.CHINESE,
    "use_v4_server": True,
})

Florence2OCREngine

Microsoft's Florence-2 vision-language model for OCR.

from panoocr.engines.florence2 import Florence2OCREngine

engine = Florence2OCREngine(config={
    "model_id": "microsoft/Florence-2-large",
})

Advanced Usage

Custom Perspectives

from panoocr import PanoOCR, PerspectivePreset, generate_perspectives

# Use a preset
pano = PanoOCR(engine, perspectives=PerspectivePreset.ZOOMED_IN)

# Or create custom perspectives
custom_perspectives = generate_perspectives(
    pixel_size=1024,
    horizontal_fov=30,
    vertical_fov=30,
    pitch_offsets=[0, 15, -15],  # Multiple rows
)
pano = PanoOCR(engine, perspectives=custom_perspectives)

Multi-Scale Detection

from panoocr import PanoOCR, PerspectivePreset

pano = PanoOCR(engine)

# Run OCR at multiple scales to catch both small and large text
result = pano.recognize_multi(
    "panorama.jpg",
    presets=[
        PerspectivePreset.ZOOMED_IN,
        PerspectivePreset.DEFAULT,
    ],
)

Custom Deduplication Settings

from panoocr import PanoOCR, DedupOptions

pano = PanoOCR(
    engine,
    dedup_options=DedupOptions(
        min_text_similarity=0.6,
        min_intersection_ratio=0.2,
    ),
)

Using the Protocol for Custom Engines

You can create your own OCR engine by implementing the OCREngine protocol:

from panoocr import OCREngine, FlatOCRResult
from PIL import Image

class MyCustomEngine:
    def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
        # Your OCR implementation here
        # Return results with normalized bounding boxes (0-1 range)
        ...

# No inheritance required - just implement the method
engine = MyCustomEngine()
pano = PanoOCR(engine)

Preview Tool

The package includes an interactive HTML preview tool for visualizing OCR results on the panorama. Open preview/index.html in a browser and drag & drop your panorama image and JSON results file.

Output Format

OCR results are returned as SphereOCRResult objects with spherical coordinates:

{
  "results": [
    {
      "text": "HELLO WORLD",
      "confidence": 0.95,
      "yaw": 45.0,
      "pitch": 0.0,
      "width": 10.5,
      "height": 3.2,
      "engine": "APPLE_VISION_FRAMEWORK"
    }
  ],
  "image_path": "panorama.jpg",
  "perspective_preset": "default"
}
  • yaw: Horizontal angle in degrees (-180 to 180)
  • pitch: Vertical angle in degrees (-90 to 90)
  • width, height: Angular dimensions in degrees

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panoocr-0.2.1.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panoocr-0.2.1-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file panoocr-0.2.1.tar.gz.

File metadata

  • Download URL: panoocr-0.2.1.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for panoocr-0.2.1.tar.gz
Algorithm Hash digest
SHA256 84a70fdc1f4605c1e9d8c007488647b686d90655240dfe944fd4d9acf348c1c2
MD5 25e3829f41a2a0a428d740c951f2d110
BLAKE2b-256 29d082cf445a82641180f083cf32cc04241bf8b5ddab9b67f4294b83bfb76baf

See more details on using hashes here.

Provenance

The following attestation bundles were made for panoocr-0.2.1.tar.gz:

Publisher: publish.yml on yz3440/panoocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file panoocr-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: panoocr-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for panoocr-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85282fd6e321313ddbc15ee28692d5e7dde13732aa06ca0bf6e6617772c0f5c3
MD5 9f3d46b2b3a4dbd1f613ad9866ba7f88
BLAKE2b-256 b9cb1a95c68a12851f49f96a6857d947eb26e1ea3b403cc9a7c906759365fd4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for panoocr-0.2.1-py3-none-any.whl:

Publisher: publish.yml on yz3440/panoocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page