Skip to main content

OCR project using LLMs

Project description

vllmocr

PyPI version

vllmocr is a command-line tool that performs Optical Character Recognition (OCR) on images and PDFs using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI, Anthropic, Google, and local models via Ollama.

Features

  • Image and PDF OCR: Extracts text from both images (PNG, JPG, JPEG) and PDF files.
  • Multiple LLM Providers: Supports a variety of LLMs:
    • OpenAI: GPT-4o
    • Anthropic: Claude 3 Haiku, Claude 3 Sonnet
    • Google: Gemini 1.5 Pro
    • Ollama: (Local models) Llama3, MiniCPM, and other models supported by Ollama.
  • Configurable: Settings, including the LLM provider and model, can be adjusted via a configuration file or environment variables.
  • Image Preprocessing: Includes optional image rotation for improved OCR accuracy.

Installation

It is recommended to install vllmocr using uv:

uv pip install vllmocr

If you don't have uv installed, you can install it with:

pipx install uv

You may need to restart your shell session for uv to be available.

Alternatively, you can use pip:

pip install vllmocr

Usage

The vllmocr command-line tool has two main subcommands: image and pdf.

1. Process a Single Image:

vllmocr image <image_path> [options]
  • <image_path>: The path to the image file (PNG, JPG, JPEG).

Options:

  • --provider: The LLM provider to use (openai, anthropic, google, ollama). Defaults to openai.
  • --model: The specific model to use (e.g., gpt-4o, haiku, gemini-1.5-pro-002, llama3). Defaults to the provider's default model.
  • --config: Path to a TOML configuration file.
  • --help: Show the help message and exit.

Example:

vllmocr image my_image.jpg --provider anthropic --model haiku

2. Process a PDF:

vllmocr pdf <pdf_path> [options]
  • <pdf_path>: The path to the PDF file.

Options: (Same as image subcommand)

Example:

vllmocr pdf my_document.pdf --provider openai --model gpt-4o

Configuration

vllmocr can be configured using a TOML file or environment variables. The configuration file is searched for in the following locations (in order of precedence):

  1. A path specified with the --config command-line option.
  2. ./config.toml (current working directory)
  3. ~/.config/vllmocr/config.toml (user's home directory)
  4. /etc/vllmocr/config.toml (system-wide)

config.toml (Example):

[llm]
provider = "anthropic"  # Default provider
model = "haiku"        # Default model for the provider

[image_processing]
rotation = 0           # Image rotation in degrees (optional)

[api_keys]
openai = "YOUR_OPENAI_API_KEY"
anthropic = "YOUR_ANTHROPIC_API_KEY"
google = "YOUR_GOOGLE_API_KEY"
# Ollama doesn't require an API key

Environment Variables:

You can also set API keys using environment variables:

  • VLLM_OCR_OPENAI_API_KEY
  • VLLM_OCR_ANTHROPIC_API_KEY
  • VLLM_OCR_GOOGLE_API_KEY

Environment variables override settings in the configuration file. This is the recommended way to set API keys for security reasons.

Development

To set up a development environment:

  1. Clone the repository:

    git clone https://github.com/<your-username>/vllmocr.git
    cd vllmocr
    
  2. Create and activate a virtual environment (using uv):

    uv venv
    uv pip install -e .[dev]
    

    This installs the package in editable mode (-e) along with development dependencies (like pytest and pytest-mock).

  3. Run tests:

    uv pip install pytest pytest-mock  # if not already installed as dev dependencies
    pytest
    

License

This project is licensed under the MIT License (see pyproject.toml for details).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmocr-0.3.2.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllmocr-0.3.2-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file vllmocr-0.3.2.tar.gz.

File metadata

  • Download URL: vllmocr-0.3.2.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for vllmocr-0.3.2.tar.gz
Algorithm Hash digest
SHA256 d196b569b234180d88295fa5c0faf0160051e0510057baf8e2a8eb82739ac6a5
MD5 b41800a4fea8891dbfb81c9fa27e85c3
BLAKE2b-256 0966c38c5f965bf10ae32c7be80467fb5ff4ebdf6d421bce01318f07abb75239

See more details on using hashes here.

File details

Details for the file vllmocr-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: vllmocr-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for vllmocr-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fef731404345829f2a8a2c8f1b87c67c3faa345567ce5592fda42af19b083b1c
MD5 0d7aa760691c861066905d2f0578ff3c
BLAKE2b-256 edbe7445d02cbca3388c8b4dd63285c77975cd268cc5e8e313cf3e0e8cdb4bad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page