Skip to main content

Library-first OCR and layout detection for historical documents

Project description

churro-ocr

churro-ocr is a Python toolkit for OCR and page detection on historical documents.

Full documentation and project overview live at https://stanford-oval.github.io/Churro/.

Install

Install only the pieces you need:

pip install churro-ocr
pip install "churro-ocr[llm]"
pip install "churro-ocr[local]"
pip install "churro-ocr[hf]"
pip install "churro-ocr[vllm]"
pip install "churro-ocr[azure]"
pip install "churro-ocr[mistral]"
pip install "churro-ocr[pdf]"
pip install "churro-ocr[all]"

Which API Should You Use?

Goal API
OCR one page or one image OCRClient
Detect page crops only DocumentPageDetector
Run an end-to-end image or PDF OCR workflow DocumentOCRPipeline
Tune backend/provider options directly build_ocr_backend(...) + OCRBackendSpec

Quick Start

from churro_ocr.ocr import OCRClient
from churro_ocr.providers import OCRBackendSpec, build_ocr_backend

backend = build_ocr_backend(
    OCRBackendSpec(
        provider="litellm",
        model="vertex_ai/gemini-2.5-flash",
    )
)

page = OCRClient(backend).ocr_image(image_path="scan.png")

print(page.text)

More

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

churro_ocr-0.2.0.tar.gz (51.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

churro_ocr-0.2.0-py3-none-any.whl (60.3 kB view details)

Uploaded Python 3

File details

Details for the file churro_ocr-0.2.0.tar.gz.

File metadata

  • Download URL: churro_ocr-0.2.0.tar.gz
  • Upload date:
  • Size: 51.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for churro_ocr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f82ee1c84fee908dbd6e34103f6072ca9d97afaedff9b9de4279becaf0464cc2
MD5 5f000a755f9b833703d09b2231ed8820
BLAKE2b-256 53daa7979da826ae30bbfa0bb6d8ffff6cfa4ae3bb40efd418e73ae84c88781c

See more details on using hashes here.

Provenance

The following attestation bundles were made for churro_ocr-0.2.0.tar.gz:

Publisher: publish.yml on stanford-oval/Churro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file churro_ocr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: churro_ocr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for churro_ocr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 27e2017132266e49318b476f0cb67b70522b63e5c12ed34590ac1c642d08a0ad
MD5 8cd3d2e93588a125645e44da95905775
BLAKE2b-256 d1f433b1ac2d5e63d4baa2013740f28500730b8e8e2677108b44520bfada0da2

See more details on using hashes here.

Provenance

The following attestation bundles were made for churro_ocr-0.2.0-py3-none-any.whl:

Publisher: publish.yml on stanford-oval/Churro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page