Convert PDFs to markdown using Large Language Models (LLMs) with vision capabilities

These details have not been verified by PyPI

Project links

Project description

LLM OCR

Convert PDFs to markdown using Large Language Models (LLMs) with vision capabilities.

Features

🔍 High-quality OCR using vision-capable LLMs
📄 Batch processing of multiple PDF pages
🔌 Multiple provider support (Gemini, OpenAI)
⚙️ Configurable processing settings
🔄 Automatic retry logic for transient errors
📝 Clean markdown output

Installation

pip install ocr-llm

System Dependencies

You also need to install poppler (required for PDF processing):

# macOS
brew install poppler

# Ubuntu/Debian
sudo apt-get install poppler-utils

# Fedora/RHEL
sudo yum install poppler-utils

Dependencies

The library requires:

System: poppler-utils for PDF processing
Python:
- google-genai for Gemini provider
- openai for OpenAI provider
- pdf2image and Pillow for PDF processing

Quick Start

Using OpenAI

import asyncio
from llm_ocr import LLMOCR, OpenAI

async def main():
    # Initialize OpenAI provider
    provider = OpenAI(
        api_key="your-api-key",  # Or set OPENAI_API_KEY env var
        model=OpenAI.GPT_4O_MINI
    )

    # Create OCR processor
    async with LLMOCR(provider) as ocr:
        # Convert PDF to markdown
        markdown = await ocr.convert(
            "document.pdf",
            output_path="output.md"
        )
        print(markdown)

asyncio.run(main())

Using Gemini

import asyncio
from llm_ocr import LLMOCR, Gemini

async def main():
    # Initialize Gemini provider
    provider = Gemini(
        api_key="your-api-key",  # Or set GEMINI_API_KEY env var
        model=Gemini.FLASH_2_5  # Or Gemini.PRO_2_5 for best quality
    )

    # Create OCR processor
    async with LLMOCR(provider) as ocr:
        # Convert PDF to markdown
        markdown = await ocr.convert(
            "document.pdf",
            output_path="output.md"
        )
        print(markdown)

asyncio.run(main())

Available Models

OpenAI

OpenAI.GPT_4O
OpenAI.GPT_4O_MINI (default)

Additional models: O1, O3, O4_MINI, GPT_5, GPT_5_MINI, GPT_4_1, and more.

See llm_ocr/providers/openai.py for the complete list.

Gemini

Gemini.PRO_2_5
Gemini.FLASH_2_5 (default)

Additional models: PRO_2_0, FLASH_2_0.

See llm_ocr/providers/gemini.py for the complete list.

Configuration

Customize the OCR processing with OCRConfig:

from llm_ocr import LLMOCR, OpenAI, OCRConfig

config = OCRConfig(
    dpi=300,                    # Higher DPI for better quality
    max_pages=10,               # Limit number of pages to process
    llm_batch_size=2,           # Send 2 pages to LLM at once
    convert_to_grayscale=True,  # Convert images to grayscale
    max_retries=3,              # Retry failed requests
    retry_delay=1.0,            # Wait 1 second between retries
    include_page_markers=True,  # Add page markers in output
)

provider = OpenAI()
ocr = LLMOCR(provider, config=config)

Configuration Options

Option	Default	Description
`dpi`	200	DPI for PDF to image conversion (72-600)
`max_pages`	None	Maximum number of pages to process
`batch_size`	5	PDF to image conversion batch size
`llm_batch_size`	1	Number of pages to send to LLM at once
`thread_count`	4	Number of threads for PDF conversion
`convert_to_grayscale`	False	Convert images to grayscale
`optimize_png`	True	Optimize PNG compression
`use_cropbox`	True	Use PDF cropbox for conversion
`max_retries`	3	Maximum retry attempts for failed requests
`retry_delay`	1.0	Delay between retries in seconds
`include_page_markers`	False	Add page markers in markdown output

Advanced Usage

Custom Provider Parameters

Pass additional parameters to the LLM provider:

# OpenAI with custom parameters
provider = OpenAI(
    model=OpenAI.GPT_4O,
    max_tokens=4000,
    temperature=0.0,
)

# Gemini with custom parameters
provider = Gemini(
    model=Gemini.PRO_2_5,
    temperature=0.0,
)

Processing Multiple Documents

import asyncio
from pathlib import Path
from llm_ocr import LLMOCR, OpenAI

async def process_documents():
    provider = OpenAI()

    async with LLMOCR(provider) as ocr:
        pdf_files = Path("pdfs").glob("*.pdf")

        for pdf_file in pdf_files:
            output_file = pdf_file.with_suffix(".md")
            await ocr.convert(pdf_file, output_path=output_file)
            print(f"Converted {pdf_file.name} -> {output_file.name}")

asyncio.run(process_documents())

Without Context Manager

If you prefer not to use the context manager:

import asyncio
from llm_ocr import LLMOCR, OpenAI

async def main():
    provider = OpenAI()
    ocr = LLMOCR(provider)

    try:
        markdown = await ocr.convert("document.pdf")
        print(markdown)
    finally:
        await ocr.aclose()  # Don't forget to close!

asyncio.run(main())

Environment Variables

Set API keys via environment variables:

# For OpenAI
export OPENAI_API_KEY="your-openai-api-key"

# For Gemini
export GEMINI_API_KEY="your-gemini-api-key"

Then use providers without passing API keys:

# API key read from environment variable
provider = OpenAI()  # Uses OPENAI_API_KEY
# or
provider = Gemini()  # Uses GEMINI_API_KEY

Error Handling

The library uses a fail-fast approach with automatic retries:

import asyncio
from llm_ocr import LLMOCR, OpenAI, OCRConfig

async def main():
    provider = OpenAI()
    config = OCRConfig(
        max_retries=5,      # Retry up to 5 times
        retry_delay=2.0,    # Wait 2 seconds between retries
    )

    async with LLMOCR(provider, config) as ocr:
        try:
            markdown = await ocr.convert("document.pdf")
            print(markdown)
        except Exception as e:
            print(f"Failed to process document: {e}")

asyncio.run(main())

License

See LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_llm-1.0.0.tar.gz (13.3 kB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocr_llm-1.0.0-py3-none-any.whl (12.9 kB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file ocr_llm-1.0.0.tar.gz.

File metadata

Download URL: ocr_llm-1.0.0.tar.gz
Upload date: Nov 4, 2025
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ocr_llm-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`62cd37a82ab142925ba74f31e6e026afdee5dc41cde43bc022ae1e9cbc5a4e51`
MD5	`aee623b21a9fa9f2b087d749cf407c9c`
BLAKE2b-256	`d6bdf2c2614831d78665703c1deaecfa1b54c2ee731f7906d00c74a663cf984b`

See more details on using hashes here.

File details

Details for the file ocr_llm-1.0.0-py3-none-any.whl.

File metadata

Download URL: ocr_llm-1.0.0-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ocr_llm-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18e5fca0b8ed1797727ea7d249a2f0bef3c22b8bcdd4461af8abbcdb5ad7b5bd`
MD5	`6280991284010efab925fa77de29e360`
BLAKE2b-256	`54f34cb71aaaf10c1791568869c7aad46c7fff68d81a1e7cabec216df7429deb`

See more details on using hashes here.

ocr-llm 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM OCR

Features

Installation

System Dependencies

Dependencies

Quick Start

Using OpenAI

Using Gemini

Available Models

OpenAI

Gemini

Configuration

Configuration Options

Advanced Usage

Custom Provider Parameters

Processing Multiple Documents

Without Context Manager

Environment Variables

Error Handling

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes