Onnx Text Recognition (OnnxTR) plugin for docling
Project description
The docling-OCR-OnnxTR repository provides a plugin that integrates the OnnxTR OCR engine into the Docling framework, enhancing document processing capabilities with efficient and accurate text recognition.
Key Features:
-
Seamless Integration: Easily incorporate OnnxTR's OCR functionalities into your Docling workflows for improved document parsing and analysis.
-
Optimized Performance: Leverages OnnxTR's lightweight architecture to deliver faster inference times and reduced resource consumption compared to traditional OCR engines.
-
Flexible Deployment: Supports various hardware configurations, including CPU, GPU, and OpenVINO, allowing you to choose the best setup for your needs.
Installation:
To install the plugin, use one of the following commands based on your hardware:
# For CPU
pip install docling-ocr-onnxtr[cpu]
# For Nvidia GPU
pip install docling-ocr-onnxtr[gpu]
# For Intel GPU / Integrated Graphics
pip install docling-ocr-onnxtr[openvino]
# Headless mode (no GUI)
# For CPU
pip install docling-ocr-onnxtr[cpu-headless]
# For Nvidia GPU
pip install docling-ocr-onnxtr[gpu-headless]
# For Intel GPU / Integrated Graphics
pip install docling-ocr-onnxtr[openvino-headless]
By integrating OnnxTR with Docling, users can achieve more efficient and accurate OCR results, enhancing the overall document processing experience.
Usage
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import (
ConversionResult,
DocumentConverter,
InputFormat,
PdfFormatOption,
)
from docling_ocr_onnxtr import OnnxtrOcrOptions
def main():
# Source document to convert
source = "https://arxiv.org/pdf/2408.09869v4"
# Available detection & recognition models can be found at
# https://github.com/felixdittrich92/OnnxTR
# Or you choose a model from Hugging Face Hub
# Collection: https://huggingface.co/collections/Felix92/onnxtr-66bf213a9f88f7346c90e842
ocr_options = OnnxtrOcrOptions(
# Text detection model
det_arch="db_mobilenet_v3_large",
# Text recognition model - from Hugging Face Hub
reco_arch="Felix92/onnxtr-parseq-multilingual-v1",
# This can be set to `True` to auto-correct the orientation of the pages
auto_correct_orientation=False,
)
pipeline_options = PdfPipelineOptions(
ocr_options=ocr_options,
)
pipeline_options.allow_external_plugins = True # <-- enabled the external plugins
# Convert the document
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options,
),
},
)
conversion_result: ConversionResult = converter.convert(source=source)
doc = conversion_result.document
md = doc.export_to_markdown()
print(md)
if __name__ == "__main__":
main()
Configuration
The configuration of the OCR engine is done via the OnnxtrOcrOptions class. The following options are available:
lang: List of languages to use for OCR. Default is["en", "fr"].confidence_score: Word confidence threshold for the recognition model. Default is0.5.objectness_score: Detection model objectness score threshold. Default is0.3.det_arch: Detection model architecture. Default is"fast_base".reco_arch: Recognition model architecture. Default is"crnn_vgg16_bn".reco_bs: Batch size for the recognition model. Default is512.auto_correct_orientation: Whether to auto-correct the orientation of the pages. Default isFalse.preserve_aspect_ratio: Whether to preserve the aspect ratio of the images. Default isTrue.symmetric_pad: Whether to use symmetric padding. Default isTrue.paragraph_break: Paragraph break threshold. Default is0.035.load_in_8_bit: Whether to load the model in 8-bit. Default isFalse. (Not supported for Hugging Face loaded models yet)providers: List of providers to use for the Onnxruntime. Default isNonewhich means auto-select.session_options: Session options for the Onnxruntime. Default isNonewhich means default OnnxTR session options.
Available Hugging Face models can be found at Hugging Face.
Further information:
Please take a look at OnnxTR.
Contributing
Contributions are welcome!
Before opening a pull request, please ensure that your code passes the tests and adheres to the project's coding standards.
You can run the tests and checks using:
make style
make quality
make test
License
Distributed under the Apache 2.0 License. See LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docling_ocr_onnxtr-0.1.1.tar.gz.
File metadata
- Download URL: docling_ocr_onnxtr-0.1.1.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d753efcfcd4a965a70363f9d18b5eb9839431a96e395206e006d71bfc05f1d9a
|
|
| MD5 |
8b419c2e41d3f0fcd8ca36e6c5de5532
|
|
| BLAKE2b-256 |
3fdcc0c08aa085551f428a003a63f610bbbcf736e8bee329b81a637ad3cafa2d
|
File details
Details for the file docling_ocr_onnxtr-0.1.1-py3-none-any.whl.
File metadata
- Download URL: docling_ocr_onnxtr-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07718e48c60d53b7104827396402044c4b84eedc32e4a8a5a55af1b5d5a83ac5
|
|
| MD5 |
e9309568748642dfc623d07b6d52e262
|
|
| BLAKE2b-256 |
527c5a376ef3fb60c28279f6334f3f3dfb465c38f3b4e6c2d2c48023c458258a
|