PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.
Project description
PanoOCR
PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.
https://github.com/user-attachments/assets/57507c48-ec88-4d4a-bf68-067eefc9d42f
Features
- Multiple OCR Engines: Support for MacOCR (Apple Vision), EasyOCR, PaddleOCR, Florence-2, and TrOCR
- Automatic Perspective Projection: Converts equirectangular panoramas to multiple perspective views for better OCR accuracy
- Deduplication: Automatically removes duplicate text detections across overlapping perspective views
- Spherical Coordinates: Returns OCR results in yaw/pitch coordinates that map directly to the panorama
- Preview Tool: Interactive 3D preview of OCR results on the panorama
Installation
Install the base package:
pip install panoocr
Install with OCR engine dependencies:
# macOS (Apple Vision Framework)
pip install "panoocr[macocr]"
# EasyOCR (cross-platform)
pip install "panoocr[easyocr]"
# PaddleOCR (cross-platform)
pip install "panoocr[paddleocr]"
# Florence-2 (requires GPU recommended)
pip install "panoocr[florence2]"
# All engines (excluding platform-specific macocr)
pip install "panoocr[full]"
Using uv (recommended):
uv add panoocr
uv add "panoocr[macocr]" # or other extras
Quick Start
from panoocr import PanoOCR
from panoocr.engines.macocr import MacOCREngine # or other engines
# Create an OCR engine
engine = MacOCREngine()
# Create the PanoOCR pipeline
pano = PanoOCR(engine)
# Run OCR on a panorama
result = pano.recognize("panorama.jpg")
# Save results as JSON
result.save_json("results.json")
# Access individual results
for r in result.results:
print(f"Text: {r.text}")
print(f"Position: yaw={r.yaw}°, pitch={r.pitch}°")
print(f"Confidence: {r.confidence}")
Available OCR Engines
MacOCREngine (macOS only)
Uses Apple's Vision Framework for fast, accurate OCR on macOS.
from panoocr.engines.macocr import MacOCREngine, MacOCRLanguageCode
engine = MacOCREngine(config={
"language_preference": [MacOCRLanguageCode.ENGLISH_US],
})
EasyOCREngine
Cross-platform OCR supporting 80+ languages.
from panoocr.engines.easyocr import EasyOCREngine, EasyOCRLanguageCode
engine = EasyOCREngine(config={
"language_preference": [EasyOCRLanguageCode.ENGLISH],
"gpu": True,
})
PaddleOCREngine
PaddlePaddle-based OCR supporting multiple languages with automatic model management. Uses PP-OCRv5 by default.
from panoocr.engines.paddleocr import PaddleOCREngine, PaddleOCRLanguageCode
engine = PaddleOCREngine(config={
"language_preference": PaddleOCRLanguageCode.CHINESE,
})
Florence2OCREngine
Microsoft's Florence-2 vision-language model for OCR.
from panoocr.engines.florence2 import Florence2OCREngine
engine = Florence2OCREngine(config={
"model_id": "microsoft/Florence-2-large",
})
Advanced Usage
Custom Perspectives
from panoocr import PanoOCR, PerspectivePreset, generate_perspectives
# Use a preset
pano = PanoOCR(engine, perspectives=PerspectivePreset.ZOOMED_IN)
# Or create custom perspectives
custom_perspectives = generate_perspectives(
fov=30, # Horizontal FOV in degrees
resolution=1024, # Pixel width/height
overlap=0.5, # 50% overlap between adjacent views
pitch_angles=[0, 15, -15], # Multiple rows
)
pano = PanoOCR(engine, perspectives=custom_perspectives)
Multi-Scale Detection
from panoocr import PanoOCR, PerspectivePreset
pano = PanoOCR(engine)
# Run OCR at multiple scales to catch both small and large text
result = pano.recognize_multi(
"panorama.jpg",
presets=[
PerspectivePreset.ZOOMED_IN,
PerspectivePreset.DEFAULT,
],
)
Custom Deduplication Settings
from panoocr import PanoOCR, DedupOptions
pano = PanoOCR(
engine,
dedup_options=DedupOptions(
min_text_similarity=0.6,
min_intersection_ratio=0.2,
),
)
Using the Protocol for Custom Engines
You can create your own OCR engine by implementing the OCREngine protocol:
from panoocr import OCREngine, FlatOCRResult
from PIL import Image
class MyCustomEngine:
def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
# Your OCR implementation here
# Return results with normalized bounding boxes (0-1 range)
...
# No inheritance required - just implement the method
engine = MyCustomEngine()
pano = PanoOCR(engine)
Preview Tool
The package includes an interactive HTML preview tool for visualizing OCR results on the panorama. Open preview/index.html in a browser and drag & drop your panorama image and JSON results file.
Output Format
OCR results are returned as SphereOCRResult objects with spherical coordinates:
{
"results": [
{
"text": "HELLO WORLD",
"confidence": 0.95,
"yaw": 45.0,
"pitch": 0.0,
"width": 10.5,
"height": 3.2,
"engine": "APPLE_VISION_FRAMEWORK"
}
],
"image_path": "panorama.jpg",
"perspective_preset": "default"
}
yaw: Horizontal angle in degrees (-180 to 180)pitch: Vertical angle in degrees (-90 to 90)width,height: Angular dimensions in degrees
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file panoocr-0.4.1.tar.gz.
File metadata
- Download URL: panoocr-0.4.1.tar.gz
- Upload date:
- Size: 14.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0675ca9d40a3eff242fe08c5882719310b3e4c80ef1286d5bf0df5d4da1fe2f9
|
|
| MD5 |
ce5338891398bfaff0b5283da1e2ae1b
|
|
| BLAKE2b-256 |
03774ddeb5cac5dfc53befb4456ca1cd0eca4f5b608f165b3500d9669344410c
|
Provenance
The following attestation bundles were made for panoocr-0.4.1.tar.gz:
Publisher:
publish.yml on yz3440/panoocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
panoocr-0.4.1.tar.gz -
Subject digest:
0675ca9d40a3eff242fe08c5882719310b3e4c80ef1286d5bf0df5d4da1fe2f9 - Sigstore transparency entry: 946059130
- Sigstore integration time:
-
Permalink:
yz3440/panoocr@d9b058f3001c1eed8b569f421486ae11830875c2 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/yz3440
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9b058f3001c1eed8b569f421486ae11830875c2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file panoocr-0.4.1-py3-none-any.whl.
File metadata
- Download URL: panoocr-0.4.1-py3-none-any.whl
- Upload date:
- Size: 38.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c940f1f7cdfc13d06ae248f0018dc91b0dabcde4629e7a59985b08f82ad5cd0f
|
|
| MD5 |
4e51beef413678c5b20ea6f783a16e56
|
|
| BLAKE2b-256 |
cf82528feeb2c135d87ab4718aae715a25050dbdc80fb763d4a004dbb4f7b746
|
Provenance
The following attestation bundles were made for panoocr-0.4.1-py3-none-any.whl:
Publisher:
publish.yml on yz3440/panoocr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
panoocr-0.4.1-py3-none-any.whl -
Subject digest:
c940f1f7cdfc13d06ae248f0018dc91b0dabcde4629e7a59985b08f82ad5cd0f - Sigstore transparency entry: 946059136
- Sigstore integration time:
-
Permalink:
yz3440/panoocr@d9b058f3001c1eed8b569f421486ae11830875c2 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/yz3440
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9b058f3001c1eed8b569f421486ae11830875c2 -
Trigger Event:
release
-
Statement type: