Skip to main content

OCR and page detection for historical documents

Project description

churro-ocr

churro-ocr is a Python toolkit for OCR and page detection on historical documents.

Full documentation and project overview live at https://stanford-oval.github.io/Churro/.

Install

For the CLI-first workflow used in the docs, install Churro with UV as a tool.

uv tool install churro-ocr

If you are adding churro-ocr to a project instead, use uv add churro-ocr and prefix CLI commands with uv run.

Runtime setup and provider-specific install commands are in Getting Started and Providers And Configuration.

Which API Should You Use?

Goal API
OCR one page or one image OCRClient
Detect page crops only DocumentPageDetector
Run an end-to-end image or PDF OCR workflow DocumentOCRPipeline
Tune backend/provider options directly build_ocr_backend(...) + OCRBackendSpec

Quick Start

This example assumes you already installed the runtime for the provider you want to use.

from churro_ocr.ocr import OCRClient
from churro_ocr.providers import OCRBackendSpec, build_ocr_backend

backend = build_ocr_backend(
    OCRBackendSpec(
        provider="litellm",
        model="vertex_ai/gemini-2.5-flash",
    )
)

page = OCRClient(backend).ocr_image(image_path="scan.png")

print(page.text)

More

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

churro_ocr-0.3.0.tar.gz (89.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

churro_ocr-0.3.0-py3-none-any.whl (103.3 kB view details)

Uploaded Python 3

File details

Details for the file churro_ocr-0.3.0.tar.gz.

File metadata

  • Download URL: churro_ocr-0.3.0.tar.gz
  • Upload date:
  • Size: 89.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for churro_ocr-0.3.0.tar.gz
Algorithm Hash digest
SHA256 40cc89bb6de22d8cca1d9989c9f9542807fc0be5998089f14bc7b1907144231f
MD5 af9f265a914d39803b486813cd7c85d4
BLAKE2b-256 5fb5a5ee3c5612328c53f354c4dbc52c4f3d18e0a7a844e95b53449995277827

See more details on using hashes here.

Provenance

The following attestation bundles were made for churro_ocr-0.3.0.tar.gz:

Publisher: publish.yml on stanford-oval/Churro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file churro_ocr-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: churro_ocr-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 103.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for churro_ocr-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57c8d0546cab3b2366540e2d688fbafeee354d30b5b9933c396f6d006c88a7b6
MD5 a91ff1e115ff974b9d2b25f6948b427a
BLAKE2b-256 281f326fe152b0bb54c463e21672baa6bfb48609c88738b3975870271cfdb81e

See more details on using hashes here.

Provenance

The following attestation bundles were made for churro_ocr-0.3.0-py3-none-any.whl:

Publisher: publish.yml on stanford-oval/Churro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page