OCR using LLMs
Project description
vllmocr
vllmocr is a command-line tool that performs Optical Character Recognition (OCR) on images and PDFs using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI, Anthropic, Google, and local models via Ollama.
Features
- Image and PDF OCR: Extracts text from both images (PNG, JPG, JPEG) and PDF files.
- Multiple LLM Providers: Supports a variety of LLMs:
- OpenAI: GPT-4o
- Anthropic: Claude 3 Haiku, Claude 3 Sonnet
- Google: Gemini 1.5 Pro
- Ollama: (Local models) Llama3, MiniCPM, and other models supported by Ollama.
- Configurable: Settings, including the LLM provider and model, can be adjusted via a configuration file or environment variables.
- Image Preprocessing: Includes optional image rotation for improved OCR accuracy.
Installation
It is recommended to install vllmocr using uv:
uv pip install vllmocr
If you don't have uv installed, you can install it with:
pipx install uv
You may need to restart your shell session for uv to be available.
Alternatively, you can use pip:
pip install vllmocr
Usage
The vllmocr command-line tool has two main subcommands: image and pdf.
1. Process a Single Image:
vllmocr image <image_path> [options]
<image_path>: The path to the image file (PNG, JPG, JPEG).
Options:
--provider: The LLM provider to use (openai, anthropic, google, ollama). Defaults toopenai.--model: The specific model to use (e.g.,gpt-4o,haiku,gemini-1.5-pro-002,llama3). Defaults to the provider's default model.--api-key: The API key for the LLM provider. Overrides API keys from the config file or environment variables.--config: Path to a TOML configuration file.--help: Show the help message and exit.
Example:
vllmocr image my_image.jpg --provider anthropic --model haiku
2. Process a PDF:
vllmocr pdf <pdf_path> [options]
<pdf_path>: The path to the PDF file.
Options: (Same as image subcommand, including --api-key)
Example:
vllmocr pdf my_document.pdf --provider openai --model gpt-4o
Configuration
vllmocr can be configured using a TOML file or environment variables. The configuration file is searched for in the following locations (in order of precedence):
- A path specified with the
--configcommand-line option. ./config.toml(current working directory)~/.config/vllmocr/config.toml(user's home directory)/etc/vllmocr/config.toml(system-wide)
config.toml (Example):
[llm]
provider = "anthropic" # Default provider
model = "haiku" # Default model for the provider
[image_processing]
rotation = 0 # Image rotation in degrees (optional)
[api_keys]
openai = "YOUR_OPENAI_API_KEY"
anthropic = "YOUR_ANTHROPIC_API_KEY"
google = "YOUR_GOOGLE_API_KEY"
# Ollama doesn't require an API key
Environment Variables:
You can also set API keys using environment variables:
VLLM_OCR_OPENAI_API_KEYVLLM_OCR_ANTHROPIC_API_KEYVLLM_OCR_GOOGLE_API_KEY
Environment variables override settings in the configuration file. This is the recommended way to set API keys for security reasons. You can also pass the API key directly via the --api-key command-line option, which takes the highest precedence.
Development
To set up a development environment:
-
Clone the repository:
git clone https://github.com/<your-username>/vllmocr.git cd vllmocr
-
Create and activate a virtual environment (using
uv):uv venv uv pip install -e .[dev]
This installs the package in editable mode (
-e) along with development dependencies (likepytestandpytest-mock). -
Run tests:
uv pip install pytest pytest-mock # if not already installed as dev dependencies pytest
License
This project is licensed under the MIT License (see pyproject.toml for details).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllmocr-0.5.0.tar.gz.
File metadata
- Download URL: vllmocr-0.5.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41d622f31878882a1742bfaa4ae05c8f8dd4e1984532bec93c7ffa4109d4c4ac
|
|
| MD5 |
40ffa6865b55dda7790fffeb97eda86c
|
|
| BLAKE2b-256 |
fb60e2040cc227e5addef8c59b863e819017a51bd8a04469075a235c5424f8e6
|
File details
Details for the file vllmocr-0.5.0-py3-none-any.whl.
File metadata
- Download URL: vllmocr-0.5.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03270ee9686a7f91777148380e6f63939b10c282e9e272171e44316e8b9a059d
|
|
| MD5 |
2f5e47409ee6fab470ffc6e8f616fe0a
|
|
| BLAKE2b-256 |
19ee76af77606be3e815a6476edacb36c677276c96c064a44d27af8aa8b104db
|