Skip to main content

Local OCR extraction + LLM text correction. PaddleOCR for recognition, Ollama for correction, fully offline.

Project description

RenZi 认字

Local OCR extraction + LLM text correction. PaddleOCR recognizes text in images, a local Ollama model corrects OCR mistakes, and an optional Flask web UI ties it together. Everything runs offline — no cloud APIs.

What It Does

RenZi takes images containing Chinese/English text and produces clean electronic text:

  1. OCR — PaddleOCR (PP-OCRv5) extracts text plus per-region bounding boxes and confidence scores.
  2. Visualization — renders an annotated image (green boxes) and a reconstructed text bitmap that preserves the original layout on a white canvas.
  3. Correction — sends the raw OCR text to a local Ollama model (gemma3:1b by default) that fixes typos and misrecognized characters while keeping the original language and line structure.

Supported formats: .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp.

What it does not do: streaming OCR, mobile capture, cloud OCR, translation between languages.

Features

  • Single image or whole-directory batch OCR
  • PaddleOCR 3.6 + PP-OCRv5 server detection/recognition models
  • Optional annotated image and reconstructed text bitmap PNGs
  • Local Ollama LLM correction (offline, no API keys)
  • Flask web UI with drag-and-drop upload and side-by-side raw/corrected text
  • Unified Python API with ToolResult dataclass
  • OpenAI function-calling tools schema for agent integration
  • CLI with unified flags (-V -v -q --json -o)

Requirements

  • Python >= 3.9
  • PaddleOCR 3.6.0 + PaddlePaddle 3.0.0 (see install notes below)
  • Ollama running locally with a model pulled (e.g. ollama pull gemma3:1b)
  • A CJK TrueType font for bitmap rendering (auto-detected on Windows/macOS/Linux)

PaddlePaddle version pin

The known-good combination is PaddlePaddle 3.0.0 + PaddleOCR 3.6.0. Newer PaddlePaddle (3.1+) breaks old model loading via the PIR executor. Install from the Baidu mirror:

pip install paddlepaddle==3.0.0 -i https://mirror.baidu.com/pypi/simple

Installation

pip install -e .

For development dependencies:

pip install -e ".[dev]"

Ollama setup (for correction)

ollama pull gemma3:1b

Quick Start

Recognize one image and print text:

renzi photo.jpg

Batch a directory and write .txt transcripts:

renzi images/ -l ch -o transcripts/

Full pipeline with annotated image + text bitmap + corrected text file:

renzi scan.png --pipeline --bbox annotated.png --bitmap bitmap.png -o fixed.txt

Correct OCR text piped on stdin:

echo "我在吉林省、长春市" | renzi --correct -

Launch the web UI:

python -m renzi.web
# open http://127.0.0.1:8765

CLI Usage

renzi [-V] [-v] [-q] [--json] [-l LANG] [--confidence N] [-o PATH]
     [--bbox PATH] [--bitmap PATH] [--ollama-base URL] [--ollama-model NAME]
     [--timeout S] [--correct] [--pipeline] [PATH]
Flag Meaning
-V, --version Print version and exit
-v, --verbose Show per-item confidence and geometry
-q, --quiet Suppress non-essential output
--json Output results as JSON
-l, --lang OCR language hint (default ch)
--confidence Minimum OCR confidence (default 0.5)
-o, --output Output file (single) or directory (batch)
--bbox Write annotated image PNG to this path
--bitmap Write reconstructed text bitmap PNG to this path
--ollama-base Ollama API base URL
--ollama-model Ollama model name
--timeout LLM request timeout in seconds
--correct Correct text from stdin instead of running OCR
--pipeline Run OCR + render + LLM correction

Python API

from renzi import extract, extract_dir, correct, pipeline, ToolResult

# Single image
result = extract(image_path="photo.jpg", lang="ch")
print(result.success)        # True
print(result.data["text"])   # raw OCR text
for it in result.data["items"]:
    print(it["text"], it["confidence"], it["bbox"])

# Directory batch with text files
result = extract_dir(directory="images/", output="transcripts/")
for filename, info in result.data.items():
    print(filename, info["text"][:50])

# Correct raw OCR text
result = correct(text="我在吉林省、长春市")
print(result.data["text"])      # corrected
print(result.data["used_llm"])  # True/False

# Full pipeline with visualizations
result = pipeline(
    image_path="scan.png",
    bbox_image="annotated.png",
    bitmap_image="bitmap.png",
    output="fixed.txt",
)

Agent Integration

from renzi.tools import TOOLS, dispatch

# Register TOOLS in your OpenAI function-calling client, then:
result = dispatch("renzi_pipeline", {
    "image_path": "scan.png",
    "bbox_image": "annotated.png",
    "bitmap_image": "bitmap.png",
})

Web UI

from renzi.web import run
run(port=8765)

Drag-and-drop upload, three-up image comparison (original / annotated / text bitmap), and side-by-side raw vs corrected text with one-click copy and re-correct.

Model Cache

PaddleOCR downloads models on first run from ModelScope into ~/.paddlex/official_models/. To bundle models offline, place them in a directory and point PADDLEX_HOME (or RENZI_MODEL_DIR) at it:

export RENZI_MODEL_DIR=/path/to/official_models
renzi photo.jpg

Development

pip install -e ".[dev]"
ruff format . && ruff check . && pytest

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

renzi-0.1.0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

renzi-0.1.0-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file renzi-0.1.0.tar.gz.

File metadata

  • Download URL: renzi-0.1.0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for renzi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a8f80262926efe77c0032ed4948dc7b38cebddc7a8e8f3827e6c9326067437b5
MD5 18fcad8a18002e97b7abc09711812aa1
BLAKE2b-256 14a6c435d38cc7c548cb04f75e09bb9ba91097991f350a4d48c9621c809705cd

See more details on using hashes here.

File details

Details for the file renzi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: renzi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for renzi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa54a68f5277107580178158faad147fd67b1dcdd9aee6a37a311d304deb15f
MD5 74bbe8279279aaf689a85a9a703e231d
BLAKE2b-256 8873c87f0feddb59caf144e5793f94cb72a18944aaa7305218ded0af61f466d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page