Skip to main content

OCR support for Stache AI document loaders

Project description

stache-ai-ocr

OCR support for Stache AI document loaders.

Provides a high-priority PDF loader that falls back to OCR for scanned documents.

Installation

pip install stache-ai-ocr
apt install ocrmypdf  # System dependency required

Usage

Once installed, the OCR loader automatically registers and takes priority over the basic PDF loader for all PDF files.

The loader will:

  1. First attempt normal text extraction with pdfplumber
  2. If no text is found (scanned PDF), fall back to OCR using ocrmypdf
  3. Gracefully handle missing ocrmypdf (logs warning and returns empty text)

System Requirements

  • ocrmypdf system binary must be installed
    • Ubuntu/Debian: apt install ocrmypdf
    • macOS: brew install ocrmypdf
    • Includes Tesseract OCR engine

Priority Override

This loader registers with priority 10, overriding the basic PDF loader (priority 0). This ensures OCR is used when available without affecting systems where it's not installed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stache_ai_ocr-0.1.2.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stache_ai_ocr-0.1.2-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file stache_ai_ocr-0.1.2.tar.gz.

File metadata

  • Download URL: stache_ai_ocr-0.1.2.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for stache_ai_ocr-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9680a8574093d40ffa093838acd65de76ee8e0e01829cd0e2223cb40f55b5163
MD5 4171d85fdd60f7c41404a2dfef3e7bb4
BLAKE2b-256 2f422e078355a54342403823b3a5dca189a101db7c4a3511c2379a10630a0430

See more details on using hashes here.

File details

Details for the file stache_ai_ocr-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: stache_ai_ocr-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for stache_ai_ocr-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a9dcfcf482d32a170cbac2a3d81d054288e645d86e60fe8e0282a26e7c3c0220
MD5 ae5f86d8fd8593970b7df07193834809
BLAKE2b-256 cad7179ec962bbde4e00c0afc0294706f055e12b12273e872b73d8942e2d9e22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page