Local OCR extraction + LLM text correction. PaddleOCR for recognition, Ollama for correction, fully offline.
Project description
RenZi 认字
Local OCR extraction + LLM text correction. PaddleOCR recognizes text in images, a local Ollama model corrects OCR mistakes, and an optional Flask web UI ties it together. Everything runs offline — no cloud APIs.
What It Does
RenZi takes images containing Chinese/English text and produces clean electronic text:
- OCR — PaddleOCR (PP-OCRv5) extracts text plus per-region bounding boxes and confidence scores.
- Visualization — renders an annotated image (green boxes) and a reconstructed text bitmap that preserves the original layout on a white canvas.
- Correction — sends the raw OCR text to a local Ollama model (
gemma3:1bby default) that fixes typos and misrecognized characters while keeping the original language and line structure.
Supported formats: .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp.
What it does not do: streaming OCR, mobile capture, cloud OCR, translation between languages.
Features
- Single image or whole-directory batch OCR
- PaddleOCR 3.6 + PP-OCRv5 server detection/recognition models
- Optional annotated image and reconstructed text bitmap PNGs
- Local Ollama LLM correction (offline, no API keys)
- Flask web UI with drag-and-drop upload and side-by-side raw/corrected text
- Unified Python API with
ToolResultdataclass - OpenAI function-calling tools schema for agent integration
- CLI with unified flags (
-V -v -q --json -o)
Requirements
- Python >= 3.9
- PaddleOCR 3.6.0 + PaddlePaddle 3.0.0 (see install notes below)
- Ollama running locally with a model pulled (e.g.
ollama pull gemma3:1b) - A CJK TrueType font for bitmap rendering (auto-detected on Windows/macOS/Linux)
PaddlePaddle version pin
The known-good combination is PaddlePaddle 3.0.0 + PaddleOCR 3.6.0. Newer PaddlePaddle (3.1+) breaks old model loading via the PIR executor. Install from the Baidu mirror:
pip install paddlepaddle==3.0.0 -i https://mirror.baidu.com/pypi/simple
Installation
pip install -e .
For development dependencies:
pip install -e ".[dev]"
Ollama setup (for correction)
ollama pull gemma3:1b
Quick Start
Recognize one image and print text:
renzi photo.jpg
Batch a directory and write .txt transcripts:
renzi images/ -l ch -o transcripts/
Full pipeline with annotated image + text bitmap + corrected text file:
renzi scan.png --pipeline --bbox annotated.png --bitmap bitmap.png -o fixed.txt
Correct OCR text piped on stdin:
echo "我在吉林省、长春市" | renzi --correct -
Launch the web UI:
python -m renzi.web
# open http://127.0.0.1:8765
CLI Usage
renzi [-V] [-v] [-q] [--json] [-l LANG] [--confidence N] [-o PATH]
[--bbox PATH] [--bitmap PATH] [--ollama-base URL] [--ollama-model NAME]
[--timeout S] [--correct] [--pipeline] [PATH]
| Flag | Meaning |
|---|---|
-V, --version |
Print version and exit |
-v, --verbose |
Show per-item confidence and geometry |
-q, --quiet |
Suppress non-essential output |
--json |
Output results as JSON |
-l, --lang |
OCR language hint (default ch) |
--confidence |
Minimum OCR confidence (default 0.5) |
-o, --output |
Output file (single) or directory (batch) |
--bbox |
Write annotated image PNG to this path |
--bitmap |
Write reconstructed text bitmap PNG to this path |
--ollama-base |
Ollama API base URL |
--ollama-model |
Ollama model name |
--timeout |
LLM request timeout in seconds |
--correct |
Correct text from stdin instead of running OCR |
--pipeline |
Run OCR + render + LLM correction |
Python API
from renzi import extract, extract_dir, correct, pipeline, ToolResult
# Single image
result = extract(image_path="photo.jpg", lang="ch")
print(result.success) # True
print(result.data["text"]) # raw OCR text
for it in result.data["items"]:
print(it["text"], it["confidence"], it["bbox"])
# Directory batch with text files
result = extract_dir(directory="images/", output="transcripts/")
for filename, info in result.data.items():
print(filename, info["text"][:50])
# Correct raw OCR text
result = correct(text="我在吉林省、长春市")
print(result.data["text"]) # corrected
print(result.data["used_llm"]) # True/False
# Full pipeline with visualizations
result = pipeline(
image_path="scan.png",
bbox_image="annotated.png",
bitmap_image="bitmap.png",
output="fixed.txt",
)
Agent Integration
from renzi.tools import TOOLS, dispatch
# Register TOOLS in your OpenAI function-calling client, then:
result = dispatch("renzi_pipeline", {
"image_path": "scan.png",
"bbox_image": "annotated.png",
"bitmap_image": "bitmap.png",
})
Web UI
from renzi.web import run
run(port=8765)
Drag-and-drop upload, three-up image comparison (original / annotated / text bitmap), and side-by-side raw vs corrected text with one-click copy and re-correct.
Model Cache
PaddleOCR downloads models on first run from ModelScope into ~/.paddlex/official_models/. To bundle models offline, place them in a directory and point PADDLEX_HOME (or RENZI_MODEL_DIR) at it:
export RENZI_MODEL_DIR=/path/to/official_models
renzi photo.jpg
Development
pip install -e ".[dev]"
ruff format . && ruff check . && pytest
License
GPL-3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file renzi-0.1.0.tar.gz.
File metadata
- Download URL: renzi-0.1.0.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8f80262926efe77c0032ed4948dc7b38cebddc7a8e8f3827e6c9326067437b5
|
|
| MD5 |
18fcad8a18002e97b7abc09711812aa1
|
|
| BLAKE2b-256 |
14a6c435d38cc7c548cb04f75e09bb9ba91097991f350a4d48c9621c809705cd
|
File details
Details for the file renzi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: renzi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaa54a68f5277107580178158faad147fd67b1dcdd9aee6a37a311d304deb15f
|
|
| MD5 |
74bbe8279279aaf689a85a9a703e231d
|
|
| BLAKE2b-256 |
8873c87f0feddb59caf144e5793f94cb72a18944aaa7305218ded0af61f466d9
|