Agentic image/PDF/docs digitalization. Router for all your OCRs/VLMs/text-extractors.

These details have not been verified by PyPI

Project description

OCRAgent

Brand banner

OCR-first, agent-guided.

OCRAgent is a command-line document parsing workflow. It uses an agent to select OCR, VLM, PDF, office-document, or user-defined tools, then reviews the extracted text before writing output.

The goal is practical routing: use inexpensive extraction when it is enough, and spend model/API cost only on files that need it.

Core value comparison

Grade, Route, Parse, Review

OCRAgent works best for mixed folders: PDFs with text layers, scanned PDFs, images, office files, handwritten pages, tables, forms, and other files that should not all use the same parser.

Step	What OCRAgent does	Main artifact
Grade	Inspects file names, metadata, preview signals, and sample pages to estimate parsing difficulty.	`.ocragent_memory.txt`
Route	Chooses a parser from builtin tools and user-defined tools according to cost, scope, and prior folder notes.	tool call
Parse	Runs the selected tool and writes UTF-8 text while preserving source-relative paths.	`ocragent_results/`
Review	Checks whether extracted text is usable; retries with another tool or route when review fails.	accepted output or retry

The four steps keep the system understandable:

Grade before spending model/API cost.
Route through one tool registry.
Parse through deterministic command boundaries.
Review before writing final output.

Runtime Flow

documents
  -> init docs
  -> folder memory
  -> parser agent
  -> parser tool
  -> reviewer agent
  -> output text

Install

Install with common document backends:

python -m pip install "ocragent[full]"
ocragent --help

uv tool install "ocragent[full]"
ocragent --help

Configure a chat-completions API through environment variables:

export OCRAGENT_CHAT_BASE=http://localhost:8080/v1
export OCRAGENT_CHAT_MODEL=your-model
export OCRAGENT_CHAT_AUTHKEY=your-key

OPENAI_API_KEY is also accepted as the auth key. A vision-capable model is strongly recommended, because OCRAgent uses model judgment during grading and review. The same values can be configured in ~/.ocragent/ocragent.settings.toml, ./ocragent.settings.toml, or .env. Use src/ocragent/ocragent.settings.default.toml as the reference.

Text-only LLM vs multimodal VLM

Stage	Text-only LLM	Multimodal VLM
Grade	Uses file names, metadata, text-layer probes, and OCR samples. It can estimate readability from extracted text, but cannot inspect page images directly.	Uses thumbnails or rendered pages to judge scan quality, handwriting, diagrams, tables, layout density, and whether OCR is likely to fail.
Review	Checks whether extracted text reads coherently, whether tables look damaged in text form, and whether obvious OCR artifacts appear.	Can compare extracted text against visual page evidence when available, which is better for missing regions, layout loss, handwriting, formulas, and image-heavy pages.

Quick Start

List available tools:

ocragent tool --list
ocragent tool --list --scope=parser

Generate user tools if you want OCRAgent to call your own OCR, VLM, shell command, or API. Describe tools in plain text:

$HOME/ocragent.toolbox_user.txt

The format can follow src/ocragent/ocragent.toolbox_user.example.txt. Include tool name, scope, cost, flags, limits, call shape, and required environment variables.

Generate the runtime:

ocragent init tools

OCRAgent writes executable Python to $HOME/.ocragent/user_toolbox.py. Review this file before running it with credentials.

Initialize and parse a document folder:

cd /path/to/documents
ocragent init docs
ocragent run --out-dir ocragent_results

CLI Example

$ ocragent tool --list --scope=parser
pdf2txt	scope: parser cost: low	Extract PDF text with PyMuPDF.
	--path /path/to/file.pdf

pdf_pages_to_images	scope: parser cost: medium	Render each PDF page to a PNG image with PyMuPDF.
	--path /path/to/file.pdf
	--out-dir /path/to/page-images

pandoc2txt	scope: parser cost: low	Convert office documents to plain text with Pandoc.
	--path /path/to/file

$ cd ~/cases/mixed_docs
$ ocragent init tools --from ./ocragent.toolbox_user.txt
# writes /home/me/.ocragent/user_toolbox.py
# reports valid and failed user tools

$ ocragent init docs
# writes .ocragent_memory.txt
# reports detected groups, file_count, and unmatched_count

$ ocragent run invoice.pdf scans/ --out-dir ocragent_results
# writes parsed files under ocragent_results/
# reports parsed_count, failed_count, skipped_count, and output_stats

The commands return JSON in normal use. The example above keeps the flow compact and notes the important fields.

Output

OCRAgent preserves relative paths:

docs/report.pdf -> ocragent_results/docs/report.pdf.txt
scans/page-01.jpg -> ocragent_results/scans/page-01.jpg.md

It also writes a folder memory file:

.ocragent_memory.txt

The memory file is prose. It records file groups, difficulty estimates, tool choices, and run summaries. Later parser runs use it as context.

Architecture

CLI  (ocragent init / run / tool)
 |
AI Agents  (init_tools / parser / reviewer)
 |
Tool chain  (builtin tools + user_toolbox.py)

Architecture diagram

Plane	Responsibility	Examples
CLI and commands	Stable command behavior	config, paths, logging, stdout, stderr
Tool registry	Parser capability boundary	PDF text, image thumbnails, Pandoc, user OCR, VLM APIs
Agent loops	Runtime decisions	file grouping, tool selection, review, retry

The parser agent does not call vendor APIs directly. It reads the available parser tools, chooses one, runs it through the tool boundary, and sends extracted text to the reviewer. If review fails, the parser can retry with another tool or a higher-cost route.

Configuration

Configuration priority:

Environment variables.
./ocragent.settings.toml.
~/.ocragent/ocragent.settings.toml.
Package defaults.

Common settings:

[aigc.api.chatcomp]
base = "http://localhost:8080/v1"
authkey = ""
model = ""
model_hasVision = true

[output]
dir = "ocragent_results"
ext = "auto"
parser_summary_batch = 5

[reviewer]
max_length = 1000

The complete default file is src/ocragent/ocragent.settings.default.toml.

Documentation

Contributing

OCRAgent is beta. Breaking changes are still possible.

Useful contributions:

Add or improve builtin parser tools.
Add demo assets for real document cases.
Improve reviewer prompts and failure cases.
Strengthen tests around CLI behavior, tool discovery, and generated user tools.
Write adapters for common OCR, VLM, and document conversion backends.
Improve documentation for tested workflows.

Run tests:

uv run python -m unittest discover -s tests
uv run --extra pdf python -m unittest discover -s tests

Important paths:

src/ocragent/cli.py: command boundary.
src/ocragent/cmd/: command implementations.
src/ocragent/cmd/tool.py: builtin and user tool contract.
src/ocragent/agent/: model-facing loops.
src/ocragent/config.py: layered settings.
tests/: test suite and CLI flow checks.

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Programming Language

Release history Release notifications | RSS feed

This version

0.1.2

May 20, 2026

0.1.1

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocragent-0.1.2.tar.gz (44.2 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocragent-0.1.2-py3-none-any.whl (54.1 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file ocragent-0.1.2.tar.gz.

File metadata

Download URL: ocragent-0.1.2.tar.gz
Upload date: May 20, 2026
Size: 44.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ocragent-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`8518204c1b80d5b4fba9718015a197664509a504a2b2809f48b6dab45ce6d278`
MD5	`05de6858721f1ac4e5d8f8de238920d9`
BLAKE2b-256	`284cef3720803b7ec8309ed890efe461a29fca2fc5f8f3b18bc99a7a35955a34`

See more details on using hashes here.

File details

Details for the file ocragent-0.1.2-py3-none-any.whl.

File metadata

Download URL: ocragent-0.1.2-py3-none-any.whl
Upload date: May 20, 2026
Size: 54.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ocragent-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a7f9cf5804b9bb89595c5283726dde4eb614f752d5fd53fa55182da0193c2d6`
MD5	`0905229044dc4bfaeebc2d461d673029`
BLAKE2b-256	`af47ffc258d25e2b84115fca9d19834974d23a617b1a868cb437cb926eea4c9c`

See more details on using hashes here.

ocragent 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

OCRAgent

Grade, Route, Parse, Review

Runtime Flow

Install

Quick Start

CLI Example

Output

Architecture

Configuration

Documentation

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes