Local Tesseract OCR adapter for Swarmauri with sync, async, and batch image-to-text workflows.

These details have not been verified by PyPI

Project description

Swarmauri Logo

Swarmauri OCR Pytesseract

swarmauri_ocr_pytesseract is the Swarmauri OCR adapter for running local Tesseract-powered image-to-text extraction through a consistent Swarmauri OCRBase interface. It accepts file paths, raw bytes, or in-memory PIL images and supports synchronous, asynchronous, and batch OCR workflows.

Why Use Swarmauri OCR Pytesseract

Use the same OCR component shape across local and pipeline-based Swarmauri workflows.
Run OCR on local infrastructure without routing images through a hosted API.
Tune Tesseract language selection and engine flags for receipts, forms, scanned PDFs, screenshots, and other document images.
Reuse the same component in parsing, ingestion, indexing, and agent workflows.

FAQ

What does this package do?
It wraps PyTesseract and the local Tesseract binary behind Swarmauri's OCR component interface.

Does it require a hosted API key?
No. It runs locally, but the host must have the tesseract executable and any required language packs installed.

What image inputs are supported?
File path strings, raw image bytes, and PIL.Image.Image instances.

Can it process multiple images concurrently?
Yes. Use batch() for synchronous lists and abatch() for async execution with a concurrency limit.

Features

Local OCR backed by PyTesseract and Tesseract OCR.
Supports configurable language, config, and explicit tesseract_cmd resolution.
Accepts paths, bytes, and PIL image objects.
Includes extract_text, aextract_text, batch, and abatch methods.
Can report installed OCR languages through get_supported_languages().
Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_ocr_pytesseract

pip install swarmauri_ocr_pytesseract

System requirement:

Install the native tesseract binary and ensure it is available on PATH, or set TESSERACT_CMD to the executable location.

Usage

from swarmauri_ocr_pytesseract import PytesseractOCR

ocr = PytesseractOCR(language="eng", config="--psm 6")
text = ocr.extract_text("docs/invoice.png")
print(text)

Examples

OCR from image bytes

from pathlib import Path
from swarmauri_ocr_pytesseract import PytesseractOCR

ocr = PytesseractOCR(language="eng")
image_bytes = Path("receipts/ticket.png").read_bytes()
text = ocr.extract_text(image_bytes)
print(text)

OCR from a PIL image

from PIL import Image
from swarmauri_ocr_pytesseract import PytesseractOCR

image = Image.open("scans/form.png")
ocr = PytesseractOCR(language="eng", config="--oem 3 --psm 4")
print(ocr.extract_text(image))

Async batch OCR

import asyncio
from swarmauri_ocr_pytesseract import PytesseractOCR

ocr = PytesseractOCR(language="eng")

async def run():
    results = await ocr.abatch(
        ["scans/page1.png", "scans/page2.png", "scans/page3.png"],
        max_concurrent=2,
    )
    for index, text in enumerate(results, start=1):
        print(index, text[:120])

asyncio.run(run())

List installed OCR languages

from swarmauri_ocr_pytesseract import PytesseractOCR

ocr = PytesseractOCR()
print(ocr.get_supported_languages())

Related Packages

Swarmauri Foundations

Best Practices

Use --psm and --oem options through config to match the page layout you expect.
Install the correct .traineddata language packs for multilingual OCR.
Pre-process noisy or skewed scans before OCR to improve extraction quality.
Use PDF parsers when a PDF already contains embedded text; use OCR when the PDF or image is scan-only.

License

This project is licensed under the Apache-2.0 License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.11.0.dev1 pre-release

Jun 30, 2026

0.9.4.dev3 pre-release

May 20, 2026

0.9.4.dev2 pre-release

May 20, 2026

0.9.3

Mar 24, 2026

0.9.3.dev24 pre-release

Mar 23, 2026

0.9.3.dev22 pre-release

Mar 20, 2026

0.9.3.dev21 pre-release

Mar 20, 2026

0.9.3.dev20 pre-release

Mar 20, 2026

0.9.3.dev19 pre-release

Mar 20, 2026

0.9.3.dev18 pre-release

Mar 20, 2026

0.9.3.dev17 pre-release

Mar 20, 2026

0.9.3.dev10 pre-release

Feb 23, 2026

0.9.3.dev5 pre-release

Feb 18, 2026

0.9.3.dev4 pre-release

Feb 17, 2026

0.9.3.dev3 pre-release

Feb 17, 2026

0.9.2

Feb 17, 2026

0.9.2.dev7 pre-release

Feb 17, 2026

0.9.2.dev6 pre-release

Feb 12, 2026

0.9.0

Jan 28, 2026

0.9.0.dev21 pre-release

Jan 27, 2026

0.9.0.dev4 pre-release

Sep 11, 2025

0.9.0.dev3 pre-release

Sep 10, 2025

0.9.0.dev2 pre-release

Sep 10, 2025

0.7.5

May 23, 2025

0.7.5.dev1 pre-release

May 23, 2025

0.7.4

May 23, 2025

0.7.4.dev20 pre-release

May 23, 2025

0.7.3

Mar 31, 2025

0.7.3.dev2 pre-release

Mar 31, 2025

0.7.2

Mar 6, 2025

0.7.2.dev3 pre-release

Mar 6, 2025

0.7.2.dev2 pre-release

Mar 6, 2025

0.7.2.dev1 pre-release

Mar 6, 2025

0.7.1

Mar 6, 2025

0.7.1.dev1 pre-release

Mar 5, 2025

0.7.0

Mar 4, 2025

0.7.0.dev12 pre-release

Mar 4, 2025

0.7.0.dev11 pre-release

Mar 4, 2025

0.7.0.dev10 pre-release

Mar 4, 2025

0.7.0.dev9 pre-release

Mar 4, 2025

0.7.0.dev8 pre-release

Mar 4, 2025

0.7.0.dev7 pre-release

Mar 4, 2025

0.7.0.dev6 pre-release

Mar 4, 2025

0.7.0.dev5 pre-release

Mar 4, 2025

0.7.0.dev4 pre-release

Mar 4, 2025

0.7.0.dev3 pre-release

Mar 4, 2025

0.7.0.dev2 pre-release

Mar 3, 2025

0.6.1

Feb 19, 2025

0.6.1.dev16 pre-release

Feb 19, 2025

0.6.1.dev15 pre-release

Feb 19, 2025

0.6.1.dev14 pre-release

Feb 19, 2025

0.6.1.dev6 pre-release

Feb 17, 2025

0.6.0

Feb 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_ocr_pytesseract-0.11.0.dev1.tar.gz (9.4 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swarmauri_ocr_pytesseract-0.11.0.dev1-py3-none-any.whl (10.4 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file swarmauri_ocr_pytesseract-0.11.0.dev1.tar.gz.

File metadata

Download URL: swarmauri_ocr_pytesseract-0.11.0.dev1.tar.gz
Upload date: Jun 30, 2026
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_ocr_pytesseract-0.11.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`621b7129b04c433e80a72d1f6a7909e4d81d4b01cf4de502c9c7266116d154e9`
MD5	`855de7e524b5117ab48c81200782944d`
BLAKE2b-256	`550693528fc3ee7e5309edbee33261ce9ab895886a275b021c14cab33c30f23f`

See more details on using hashes here.

File details

Details for the file swarmauri_ocr_pytesseract-0.11.0.dev1-py3-none-any.whl.

File metadata

Download URL: swarmauri_ocr_pytesseract-0.11.0.dev1-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_ocr_pytesseract-0.11.0.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40c53aa09b82dc933f358dbe83be163a7e5070b61bc93051789f3526c72c1b79`
MD5	`0b0d652f47005aa35f0341eda6aa20d9`
BLAKE2b-256	`b0fe3ed317eb0c26b15855dd3cb8725fbc41d30be841323e979cfb1aa35e6cbf`

See more details on using hashes here.

swarmauri_ocr_pytesseract 0.11.0.dev1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Swarmauri OCR Pytesseract

Why Use Swarmauri OCR Pytesseract

FAQ

Features

Installation

Usage

Examples

OCR from image bytes

OCR from a PIL image

Async batch OCR

List installed OCR languages

Related Packages

Swarmauri Foundations

More Documentation

Best Practices

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes