Fast CPU OCR — PaddleOCR PP-OCRv6 tiny (lightweight), reimplemented in Rust + ONNX Runtime. ~7x faster than PaddlePaddle, self-contained wheels with models bundled.
Project description
faster-paddle
Fast, CPU-only OCR in Rust with Python bindings — a self-contained reimplementation of PaddleOCR's lightweight pipeline (PP-OCRv6 tiny detection + recognition) powered by ONNX Runtime.
- ⚡ ~7× faster than
paddleocron CPU for the same models and output. - 📦 Self-contained — the ONNX models (~6 MB) are bundled inside the wheel.
No
paddlepaddle, no model downloads, no network at runtime. - 🦀 Pure-Rust pre/post-processing (detection DB decode,
minAreaRect, perspective crop, CTC decode, reading-order text reconstruction). No OpenCV. - 🖥️ Prebuilt wheels for Linux, Windows, macOS (x86-64 + arm64).
paddleocr (PaddlePaddle, CPU) 22.7 s / image
faster-paddle (Rust + ONNXRuntime) 3.0 s / image → ~7.7× faster
(test image 3157×4464, AMD Ryzen 7 5800X3D; both after warm-up, same weights.)
Install
pip install faster-paddle
Usage
import faster_paddle
# One-shot, using a shared default engine (lazily initialized):
with open("document.jpg", "rb") as f:
result = faster_paddle.ocr(f.read())
print(result["text"]) # reading-order reconstructed text
for idx, b in result["bounds"].items():
print(idx, b["text"], b["confidence"], b["topLeftCoord"], b["bottomRightCoord"])
Reuse an explicit engine (recommended for servers — load the models once):
from faster_paddle import OcrEngine
engine = OcrEngine(threads=None, rec_batch=6) # threads default = physical cores
result = engine.ocr(image_bytes) # raw jpeg/png/webp/bmp/tiff/gif bytes
result = engine.ocr_base64(b64_string) # base64-encoded image
Result shape
{
"text": "full reconstructed text...",
"bounds": {
0: {
"topLeftCoord": (x1, y1),
"bottomRightCoord": (x2, y2),
"text": "line text",
"confidence": 0.97,
},
1: { ... },
}
}
This matches the JSON contract of the original paddle-ocr-api service, so it is
a drop-in replacement.
API
faster_paddle.ocr(image: bytes) -> dict |
OCR encoded image bytes (shared default engine). |
faster_paddle.ocr_base64(image_base64: str) -> dict |
OCR a base64 image string. |
OcrEngine(threads=None, rec_batch=None) |
Construct a reusable engine. |
OcrEngine.ocr(image: bytes) -> dict |
OCR encoded image bytes. |
OcrEngine.ocr_base64(image_base64: str) -> dict |
OCR a base64 image string. |
threads: ONNX Runtime intra-op threads. Defaults to the number of physical CPU cores (SMT/logical threads tend to slow compute-bound inference down).rec_batch: recognition batch size (default 6).
Calls are thread-safe (serialized internally) and release the GIL during inference.
How it works
The pipeline faithfully mirrors PaddleOCR's lightweight path:
- Detection — resize (min-side 736, clamp max-side 4000, round to ×32), normalize (BGR mean/std), run the DB detector.
- DB post-process — threshold 0.2, connected components,
minAreaRect, box score ≥ 0.4,unclipratio 1.4, rescale to source coordinates. - Sort boxes top-to-bottom / left-to-right; crop each via perspective warp.
- Recognition — resize each crop to H=48, normalize, batch, run the CTC
recognizer (
[N, T, 6906]), greedy CTC decode. - Reconstruct reading-order text with dynamic column/line detection.
Detection matches PaddlePaddle at 96 % IoU>0.5 with 0.93 character-level similarity on the recognized text; the residual difference is ONNX-Runtime vs PaddlePaddle floating-point numerics, not the algorithm.
The bundled models are PP-OCRv6_tiny_det and PP-OCRv6_tiny_rec exported with
paddle2onnx.
Building from source
pip install maturin
maturin develop --release # build + install into the current environment
# or
maturin build --release # produce a wheel in target/wheels/
Requires a Rust toolchain. ONNX Runtime is fetched automatically by the ort
crate at build time and linked into the extension.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file faster_paddle-0.0.1.tar.gz.
File metadata
- Download URL: faster_paddle-0.0.1.tar.gz
- Upload date:
- Size: 5.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb40e071919bf1be107774d4cf68120841fd609612fb7db0ebcee60a2d62fe2
|
|
| MD5 |
fa538e6b02db2c3d1d090e0fafa9a702
|
|
| BLAKE2b-256 |
902f9558547ef8901269efe2846010d014db14aa3bd1502f18bfd9a39e950818
|
File details
Details for the file faster_paddle-0.0.1-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: faster_paddle-0.0.1-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 18.8 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a6d39d2eedcf7c66f1bfa5a47826415cbe1b7263e8769ade91f0b465b83a9d2
|
|
| MD5 |
17ba69e3eee6563418eac2c9d7838832
|
|
| BLAKE2b-256 |
02b61f47c087dd976067d55bae71c509b175517ccd4a22a48f91f91b350f83e6
|
File details
Details for the file faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 19.6 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
148ce60dc9bc87c8ee693436317a12747954c9954ab76747982f3e1052c0e483
|
|
| MD5 |
1642a753b1c9af92b016ff5d7d15c599
|
|
| BLAKE2b-256 |
d0ba8d964259a1e0e63d76dfecb74d90a72d0f2793c5b9555ea28b136ae49cc3
|
File details
Details for the file faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 20.6 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5e6b8a51d29a74d7af410b466fba115e495a6b073cdff46d0b7c145754c3425
|
|
| MD5 |
de880252f0625af4edb91eda50ba183e
|
|
| BLAKE2b-256 |
3907829c6d37507c57806493248b3dd66bafa1121f4c2a6b8834a4f98f4444ea
|
File details
Details for the file faster_paddle-0.0.1-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: faster_paddle-0.0.1-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 18.6 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca5ac1e8f4907f1251f2b7b248611b84d6b5b67dd4b6affb92287f9167dda275
|
|
| MD5 |
955a8020bfd4f94edab99450c79202c9
|
|
| BLAKE2b-256 |
db16878b1562b0199cf7e8c21c19934d2a7d77b02fb0f9cc72058ad511a1a25b
|