Skip to main content

PaddleOCR engine plugin for OCRmyPDF

Project description

OCRmyPDF PaddleOCR

A plugin to use PaddleOCR as the OCR engine for OCRmyPDF, instead of Tesseract. PaddleOCR generally provides better accuracy, especially for CJK languages (Chinese, Japanese, Korean).

Unlike the EasyOCR plugin, this plugin does not require Tesseract for any operations.

Installation

pip install ocrmypdf-paddleocr

Usage

ocrmypdf --plugin ocrmypdf_paddleocr input.pdf output.pdf
ocrmypdf --plugin ocrmypdf_paddleocr -l kor input.pdf output.pdf

Or from Python:

import ocrmypdf

ocrmypdf.ocr('input.pdf', 'output.pdf', plugins=['ocrmypdf_paddleocr'])

Known limitations

  • The plugin forces jobs=1. PaddlePaddle's inference engine already uses all CPU cores internally, so parallel page processing would only cause contention

License

MPL-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrmypdf_paddleocr-0.1.1.tar.gz (234.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocrmypdf_paddleocr-0.1.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file ocrmypdf_paddleocr-0.1.1.tar.gz.

File metadata

  • Download URL: ocrmypdf_paddleocr-0.1.1.tar.gz
  • Upload date:
  • Size: 234.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for ocrmypdf_paddleocr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6553166458e9b0a1d194af17a7d29a01a48d8f495206edcb050a5975547ca173
MD5 d5ed596771a400581d9c09515b624cfa
BLAKE2b-256 17a3fb6b8ba12f54e53f836932d7639f3ffedcf967731e72fb05d48791a3de99

See more details on using hashes here.

File details

Details for the file ocrmypdf_paddleocr-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ocrmypdf_paddleocr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5756f5feccbd915ca775e93190540d33df552ca8a78d7797d3d138d1ba30cc58
MD5 9c2a9a010fa8d6407ce81299df0d2b57
BLAKE2b-256 d4f9478aaf4734351a018eedb7314bb3478dbdd4c139ad1305a21775b77d1b78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page