A Python package for OCR using Vision LLMs

These details have not been verified by PyPI

Project links

Repository

Project description

bOCR: OCR Framework with Vision LLMs

bOCR is an Optical Character Recognition (OCR) framework that uses Vision Large Language Models (VLLMs) for text extraction and document processing.

Features

Minimal Setup: Requires just a single backbone file (e.g., qwen.py or ollamas.py) for OCR execution, making it lightweight and easy to use.
Broad Vision LLM Support: Integrates with vision LLMs like Qwen, Llama, Phi, and various VLLMs included in the Ollama package.
Customizable Prompts: Fine-tune OCR output using either a custom or default prompt.
Automated Preprocessing: Image denoising, resizing, and PDF-to-image conversion.
Postprocessing & Export: Supports merging pages and multiple export formats (plain, markdown, docx, pdf).
Configurable Pipeline: A single Config object centralizes OCR settings.
Detailed Logging: Integrated verbose logging for insights and debugging.

Installation

Install from PyPI (Recommended)

pip install bocr

Install from Source (Development Version)

git clone https://github.com/adrianphoulady/bocr.git
cd bocr
pip install .

Required Dependencies

For PDF and document processing, poppler, pandoc, and LaTeX are also required. You can install them as follows:

Linux (Debian/Ubuntu)

sudo apt install poppler-utils pandoc texlive-xetex texlive-fonts-recommended lmodern

macOS (using Homebrew)

brew install poppler pandoc --cask mactex-no-gui

Windows (using Chocolatey)

choco install poppler pandoc miktex

Quick Start

Simple Example (Single File OCR)

Any backbone file in the backbones module, like qwen.py, is all you need to run OCR on an image:

from bocr.backbones.qwen import extract_text

result = extract_text("sample1.png")
print(result)

Advanced Usage

from bocr import Config, ocr

config = Config(model_id="Qwen/Qwen2-VL-7B-Instruct", export_results=True, export_format="pdf", verbose=True)
files = ["sample2.pdf"]
results = ocr(files, config)
print(results)

Command Line Example

bocr sample1.jpg --export-results --export-format docx --verbose

Configuration

The Config class centralizes OCR settings. Key parameters:

Parameter	Type	Description	Default
`prompt`	`str`/`None`	Custom OCR prompt or `None` for default.	`None`
`model_id`	`str`	Vision LLM model identifier.	`Qwen/Qwen2.5-VL-3B-Instruct`
`max_new_tokens`	`int`	Max tokens generated by model.	`1024`
`preprocess`	`bool`	Enable preprocessing of input files.	`False`
`resolution`	`int`	DPI for PDF-to-image conversion.	`150`
`max_image_size`	`int`/`None`	Resize images to a max size. No resizing if `None`.	`1920`
`result_format`	`str`	Output format (`plain`, `markdown`).	`md`
`merge_text`	`bool`	Merge extracted text.	`False`
`export_results`	`bool`	Save results to files.	`False`
`export_format`	`str`	File output format (`txt`, `md`, `docx`, `pdf`).	`md`
`export_dir`	`str`/`None`	Directory for output files. `./ocr_exports` if `None`.	`None`
`verbose`	`bool`	Enables detailed logging for debugging.	`False`

OCR Pipeline

1. Preprocessing

URL Handling: Downloads remote files if input is a URL.
PDF Conversion: Converts PDFs into image format (requires poppler installed and in PATH).
Image Enhancement: Applies denoising and contrast adjustment.
Resizing: Optimizes images for Vision LLMs.

2. Text Extraction

Extracts text using Vision LLMs, with support for custom prompts for tailored OCR instructions.

3. Postprocessing

Formats and merges extracted text in specified format.
Converts it into specified export formats (e.g., Markdown, PDF).
Saves results if configured.

Logging

Enable logging by setting verbose=True in the Config object. Logs provide insights into preprocessing, extraction, and postprocessing steps.

Supported Models

bOCR supports Vision LLMs such as:

Qwen/Qwen2.5-VL-3B-Instruct
Qwen/Qwen2.5-VL-7B-Instruct
Qwen/Qwen2.5-VL-72B-Instruct
Qwen/Qwen2-VL-2B-Instruct
Qwen/Qwen2-VL-7B-Instruct
Qwen/Qwen2-VL-72B-Instruct
Qwen/QVQ-72B-Preview
meta-llama/Llama-3.2-11B-Vision-Instruct
meta-llama/Llama-3.2-90B-Vision-Instruct
microsoft/Phi-3.5-vision-instruct
llama3.2-vision:11b from Ollama
llama3.2-vision:90b from Ollama

Additional models can be supported by implementing a new backbone in bocr/backbones/ and updating mappings.yaml.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.2.0

Feb 22, 2025

0.1.2

Feb 5, 2025

0.1.1

Feb 3, 2025

0.1.0

Feb 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bocr-0.2.0.tar.gz (16.2 kB view details)

Uploaded Feb 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bocr-0.2.0-py3-none-any.whl (19.4 kB view details)

Uploaded Feb 22, 2025 Python 3

File details

Details for the file bocr-0.2.0.tar.gz.

File metadata

Download URL: bocr-0.2.0.tar.gz
Upload date: Feb 22, 2025
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for bocr-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`031a8fe427e5cb1adf0671914ab21f3dd11ab249e1e529d9d49672b79df13b48`
MD5	`a81bfa7185a9d18ed139cdbcc28f493c`
BLAKE2b-256	`602fa7ecad814ecf6cc96c9af87e5e8e1344aac7df407827521c2016706ae7b1`

See more details on using hashes here.

File details

Details for the file bocr-0.2.0-py3-none-any.whl.

File metadata

Download URL: bocr-0.2.0-py3-none-any.whl
Upload date: Feb 22, 2025
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for bocr-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`280f29fce59314f8832a157f6f952f8aaefa20d2287a9d37cb0cc0448aff8941`
MD5	`0e6c99a3b86426755f7c825bcdff288d`
BLAKE2b-256	`7593c592b15c0384a4bbe7b7597f2d8736b55ac29128903b42042285d3a6acbe`

See more details on using hashes here.

bocr 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bOCR: OCR Framework with Vision LLMs

Features

Installation

Install from PyPI (Recommended)

Install from Source (Development Version)

Required Dependencies

Linux (Debian/Ubuntu)

macOS (using Homebrew)

Windows (using Chocolatey)

Quick Start

Simple Example (Single File OCR)

Advanced Usage

Command Line Example

Configuration

OCR Pipeline

1. Preprocessing

2. Text Extraction

3. Postprocessing

Logging

Supported Models

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes