Skip to main content

An SDK for intelligent document processing using SOTA VLLM models

Project description

Docuglean OCR - Python SDK

A unified Python SDK for intelligent document processing using State of the Art AI models.

Features

  • 🚀 Easy to Use: Simple, intuitive API with detailed documentation
  • 🔍 OCR Capabilities: Extract text from images and scanned documents
  • 📊 Structured Data Extraction: Use Pydantic models for type-safe data extraction
  • 📄 Multimodal Support: Process PDFs and images with ease
  • 🤖 Multiple AI Providers: Support for OpenAI, Mistral, Google Gemini, and Hugging Face
  • 🔒 Type Safety: Full Python type hints with Pydantic validation

Installation

pip install docuglean-ocr

Quick Start

OCR Processing

from docuglean import ocr

# Mistral OCR
result = await ocr(
    file_path="./document.pdf",
    provider="mistral",
    model="mistral-ocr-latest",
    api_key="your-api-key"
)

# Google Gemini OCR
result = await ocr(
    file_path="./document.pdf",
    provider="gemini",
    model="gemini-2.5-flash",
    api_key="your-gemini-api-key",
    prompt="Extract all text from this document"
)

# Hugging Face OCR (no API key needed)
result = await ocr(
    file_path="https://example.com/image.jpg",  # Supports URLs, local files, base64
    provider="huggingface",
    model="Qwen/Qwen2.5-VL-3B-Instruct",
    prompt="Extract all text from this image"
)

Structured Data Extraction

from docuglean import extract
from pydantic import BaseModel
from typing import List

class ReceiptItem(BaseModel):
    name: str
    price: float

class Receipt(BaseModel):
    date: str
    total: float
    items: List[ReceiptItem]

# Extract structured data with OpenAI
receipt = await extract(
    file_path="./receipt.pdf",
    provider="openai",
    api_key="your-api-key",
    response_format=Receipt,
    prompt="Extract receipt information"
)

# Extract structured data with Gemini
receipt = await extract(
    file_path="./receipt.pdf",
    provider="gemini",
    api_key="your-gemini-api-key",
    response_format=Receipt,
    prompt="Extract receipt information including date, total, and all items"
)

Development

Setup

# Install with UV
uv sync

Testing

# Run all tests
uv run pytest tests/ -v

# Run specific test files
uv run pytest tests/test_basic.py -v                    # Basic tests only
uv run pytest tests/test_ocr.py tests/test_extract.py -v  # Mistral tests (requires MISTRAL_API_KEY)
uv run pytest tests/test_openai.py -v                   # OpenAI tests (requires OPENAI_API_KEY)

# Run with output (shows print statements)
uv run pytest tests/ -v -s

# Run specific test function
uv run pytest tests/test_openai.py::test_openai_extract_unstructured_pdf -v -s

# Set API keys for real testing
export MISTRAL_API_KEY=your_mistral_key_here
export OPENAI_API_KEY=your_openai_key_here
export GEMINI_API_KEY=your_gemini_key_here
uv run pytest tests/ -v -s

Code Quality

# Run linting and type checking
uv run ruff check src/ tests/

# Fix linting issues automatically
uv run ruff check src/ tests/ --fix

# Format code
uv run ruff format src/ tests/

License

Apache 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docuglean_ocr-1.0.0.tar.gz (93.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docuglean_ocr-1.0.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file docuglean_ocr-1.0.0.tar.gz.

File metadata

  • Download URL: docuglean_ocr-1.0.0.tar.gz
  • Upload date:
  • Size: 93.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for docuglean_ocr-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b65d961c69b3e734151df7747448fe8344ed1dab1eee3288c75b52f3b47b8685
MD5 9980e20590c331c08a7c6b30e324692d
BLAKE2b-256 26b1c74ec358a51ab4da502672b784325050fba76846540fb325c6c2beb13672

See more details on using hashes here.

File details

Details for the file docuglean_ocr-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for docuglean_ocr-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f37e8b4b608406a04cbf7d6fbb62aa5870770a62263caf60775977fe006033a
MD5 0267f6759a7d97c2b7581a894f6e647a
BLAKE2b-256 d50b5f73d512e2a52c674aaf1f964bbebc8b62c0a5500374fbdeae7d87480524

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page