Skip to main content

Fast CPU OCR — PaddleOCR PP-OCRv6 tiny (lightweight), reimplemented in Rust + ONNX Runtime. ~7x faster than PaddlePaddle, self-contained wheels with models bundled.

Project description

faster-paddle

Fast, CPU-only OCR in Rust with Python bindings — a self-contained reimplementation of PaddleOCR's lightweight pipeline (PP-OCRv6 tiny detection + recognition) powered by ONNX Runtime.

  • ~7× faster than paddleocr on CPU for the same models and output.
  • 📦 Self-contained — the ONNX models (~6 MB) are bundled inside the wheel. No paddlepaddle, no model downloads, no network at runtime.
  • 🦀 Pure-Rust pre/post-processing (detection DB decode, minAreaRect, perspective crop, CTC decode, reading-order text reconstruction). No OpenCV.
  • 🖥️ Prebuilt wheels for Linux, Windows, macOS (x86-64 + arm64).
paddleocr (PaddlePaddle, CPU)        22.7 s / image
faster-paddle (Rust + ONNXRuntime)    3.0 s / image     →  ~7.7× faster

(test image 3157×4464, AMD Ryzen 7 5800X3D; both after warm-up, same weights.)


Install

pip install faster-paddle

Usage

import faster_paddle

# One-shot, using a shared default engine (lazily initialized):
with open("document.jpg", "rb") as f:
    result = faster_paddle.ocr(f.read())

print(result["text"])              # reading-order reconstructed text
for idx, b in result["bounds"].items():
    print(idx, b["text"], b["confidence"], b["topLeftCoord"], b["bottomRightCoord"])

Reuse an explicit engine (recommended for servers — load the models once):

from faster_paddle import OcrEngine

engine = OcrEngine(threads=None, rec_batch=6)   # threads default = physical cores

result = engine.ocr(image_bytes)                 # raw jpeg/png/webp/bmp/tiff/gif bytes
result = engine.ocr_base64(b64_string)           # base64-encoded image

Result shape

{
  "text": "full reconstructed text...",
  "bounds": {
     0: {
        "topLeftCoord":     (x1, y1),
        "bottomRightCoord": (x2, y2),
        "text":             "line text",
        "confidence":       0.97,
     },
     1: { ... },
  }
}

This matches the JSON contract of the original paddle-ocr-api service, so it is a drop-in replacement.

API

faster_paddle.ocr(image: bytes) -> dict OCR encoded image bytes (shared default engine).
faster_paddle.ocr_base64(image_base64: str) -> dict OCR a base64 image string.
OcrEngine(threads=None, rec_batch=None) Construct a reusable engine.
OcrEngine.ocr(image: bytes) -> dict OCR encoded image bytes.
OcrEngine.ocr_base64(image_base64: str) -> dict OCR a base64 image string.
  • threads: ONNX Runtime intra-op threads. Defaults to the number of physical CPU cores (SMT/logical threads tend to slow compute-bound inference down).
  • rec_batch: recognition batch size (default 6).

Calls are thread-safe (serialized internally) and release the GIL during inference.


How it works

The pipeline faithfully mirrors PaddleOCR's lightweight path:

  1. Detection — resize (min-side 736, clamp max-side 4000, round to ×32), normalize (BGR mean/std), run the DB detector.
  2. DB post-process — threshold 0.2, connected components, minAreaRect, box score ≥ 0.4, unclip ratio 1.4, rescale to source coordinates.
  3. Sort boxes top-to-bottom / left-to-right; crop each via perspective warp.
  4. Recognition — resize each crop to H=48, normalize, batch, run the CTC recognizer ([N, T, 6906]), greedy CTC decode.
  5. Reconstruct reading-order text with dynamic column/line detection.

Detection matches PaddlePaddle at 96 % IoU>0.5 with 0.93 character-level similarity on the recognized text; the residual difference is ONNX-Runtime vs PaddlePaddle floating-point numerics, not the algorithm.

The bundled models are PP-OCRv6_tiny_det and PP-OCRv6_tiny_rec exported with paddle2onnx.

Building from source

pip install maturin
maturin develop --release      # build + install into the current environment
# or
maturin build --release        # produce a wheel in target/wheels/

Requires a Rust toolchain. ONNX Runtime is fetched automatically by the ort crate at build time and linked into the extension.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faster_paddle-0.0.1.tar.gz (5.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

faster_paddle-0.0.1-cp38-abi3-win_amd64.whl (18.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_x86_64.whl (19.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_aarch64.whl (20.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

faster_paddle-0.0.1-cp38-abi3-macosx_11_0_arm64.whl (18.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file faster_paddle-0.0.1.tar.gz.

File metadata

  • Download URL: faster_paddle-0.0.1.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for faster_paddle-0.0.1.tar.gz
Algorithm Hash digest
SHA256 bdb40e071919bf1be107774d4cf68120841fd609612fb7db0ebcee60a2d62fe2
MD5 fa538e6b02db2c3d1d090e0fafa9a702
BLAKE2b-256 902f9558547ef8901269efe2846010d014db14aa3bd1502f18bfd9a39e950818

See more details on using hashes here.

File details

Details for the file faster_paddle-0.0.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for faster_paddle-0.0.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5a6d39d2eedcf7c66f1bfa5a47826415cbe1b7263e8769ade91f0b465b83a9d2
MD5 17ba69e3eee6563418eac2c9d7838832
BLAKE2b-256 02b61f47c087dd976067d55bae71c509b175517ccd4a22a48f91f91b350f83e6

See more details on using hashes here.

File details

Details for the file faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 148ce60dc9bc87c8ee693436317a12747954c9954ab76747982f3e1052c0e483
MD5 1642a753b1c9af92b016ff5d7d15c599
BLAKE2b-256 d0ba8d964259a1e0e63d76dfecb74d90a72d0f2793c5b9555ea28b136ae49cc3

See more details on using hashes here.

File details

Details for the file faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_paddle-0.0.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d5e6b8a51d29a74d7af410b466fba115e495a6b073cdff46d0b7c145754c3425
MD5 de880252f0625af4edb91eda50ba183e
BLAKE2b-256 3907829c6d37507c57806493248b3dd66bafa1121f4c2a6b8834a4f98f4444ea

See more details on using hashes here.

File details

Details for the file faster_paddle-0.0.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for faster_paddle-0.0.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ca5ac1e8f4907f1251f2b7b248611b84d6b5b67dd4b6affb92287f9167dda275
MD5 955a8020bfd4f94edab99450c79202c9
BLAKE2b-256 db16878b1562b0199cf7e8c21c19934d2a7d77b02fb0f9cc72058ad511a1a25b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page