Official Python SDK for LeapOCR - Transform documents into structured data using AI-powered OCR

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

valium3

These details have not been verified by PyPI

Project links

Project description

LeapOCR Python SDK

Official Python SDK for LeapOCR - Transform documents into structured data using AI-powered OCR.

Overview

LeapOCR provides enterprise-grade document processing with AI-powered data extraction. This SDK offers a Python-native, async-first interface for seamless integration into your applications.

Installation

pip install leapocr

Or using uv:

uv add leapocr

Quick Start

Prerequisites

Python 3.9 or higher
LeapOCR API key (sign up here)

Basic Example

import asyncio
import os
from leapocr import LeapOCR, ProcessOptions, Format

async def main():
    # Initialize the SDK with your API key
    async with LeapOCR(os.getenv("LEAPOCR_API_KEY")) as client:
        # Submit document for processing
        job = await client.ocr.process_url(
            "https://example.com/document.pdf",
            options=ProcessOptions(
                format=Format.STRUCTURED,
            ),
        )

        print(f"Job created: {job.job_id}")

        # Wait for processing to complete
        result = await client.ocr.wait_until_done(job.job_id)

        print(f"Credits used: {result.credits_used}")
        print(f"Pages processed: {len(result.pages)}")

        # Access extracted data (string for markdown, dict for structured)
        first_page = result.pages[0].result
        if isinstance(first_page, str):
            print(f"Extracted text: {first_page[:200]}...")
        else:
            print(f"Extracted data: {first_page}")

        # Optional: Delete job after processing (auto-deleted after 7 days)
        await client.ocr.delete_job(job.job_id)

asyncio.run(main())

Key Features

Async-First Design - Built on asyncio with httpx for high-performance concurrent processing
Type-Safe API - Full type hints and mypy strict mode support
Multiple Processing Formats - Structured data extraction (dict), markdown output (string), or per-page processing
Flexible Result Types - Results are automatically typed: str for markdown, dict for structured data
Flexible Model Selection - Choose from standard, pro, or custom AI models
Custom Schema Support - Define extraction schemas for your specific use case
Built-in Retry Logic - Automatic exponential backoff for transient failures
Context Manager Support - Proper resource cleanup with async context managers
Direct File Upload - Efficient multipart uploads for local files
Progress Tracking - Real-time callbacks for long-running operations
Automatic Cleanup - Jobs and files are automatically deleted after 7 days, or delete immediately with delete_job()

Processing Models

Use the model parameter in ProcessOptions to specify a model. Defaults to Model.STANDARD_V1.

You can also use custom model strings for organization-specific models:

from leapocr import ProcessOptions

options = ProcessOptions(
    model="my-custom-model-v1",  # Custom string model
)

Usage Examples

Processing from URL

import asyncio
from leapocr import LeapOCR, ProcessOptions, Format, Model

async def process_url():
    async with LeapOCR("your-api-key") as client:
        # Submit job
        job = await client.ocr.process_url(
            "https://example.com/invoice.pdf",
            options=ProcessOptions(
                format=Format.STRUCTURED,
                model=Model.PRO_V1,  # Use premium model for better accuracy
                instructions="Extract invoice number, date, and total amount",
            ),
        )

        # Wait for completion
        result = await client.ocr.wait_until_done(job.job_id)
        print(f"Credits used: {result.credits_used}")
        print(f"Pages: {len(result.pages)}")

        await client.ocr.delete_job(job.job_id)

asyncio.run(process_url())

Processing Local Files

from pathlib import Path

async def process_file():
    async with LeapOCR("your-api-key") as client:
        # Submit file
        job = await client.ocr.process_file(
            Path("invoice.pdf"),
            options=ProcessOptions(
                format=Format.STRUCTURED,
                model=Model.ENGLISH_PRO_V1,  # High-accuracy for English documents
            ),
        )

        # Wait for completion
        result = await client.ocr.wait_until_done(job.job_id)

        for page in result.pages:
            # result can be string (markdown) or dict (structured)
            result_length = len(page.result) if isinstance(page.result, str) else len(str(page.result))
            print(f"Page {page.page_number}: {result_length} characters")

        await client.ocr.delete_job(job.job_id)

asyncio.run(process_file())

Custom Schema Extraction

async def extract_with_schema():
    schema = {
        "type": "object",
        "properties": {
            "patient_name": {"type": "string"},
            "date_of_birth": {"type": "string"},
            "medications": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "dosage": {"type": "string"},
                    },
                },
            },
        },
    }

    async with LeapOCR("your-api-key") as client:
        # Submit with schema
        job = await client.ocr.process_file(
            "medical-record.pdf",
            options=ProcessOptions(
                format=Format.STRUCTURED,
                schema=schema,
                instructions="Extract patient information and medications",
            ),
        )

        # Wait for completion
        result = await client.ocr.wait_until_done(job.job_id)

        # Access the structured data
        import json
        page_result = result.pages[0].result

        # Handle both dict (already parsed) and string (JSON)
        if isinstance(page_result, dict):
            data = page_result
        else:
            data = json.loads(page_result)

        print(f"Patient: {data['patient_name']}")
        print(f"Medications: {len(data['medications'])}")

        await client.ocr.delete_job(job.job_id)

asyncio.run(extract_with_schema())

Output Formats

Format	Description	Use Case	Result Type
`Format.STRUCTURED`	Single JSON object	Extract specific fields across entire document	`dict`
`Format.MARKDOWN`	Text per page	Convert document to readable text	`str`
`Format.PER_PAGE_STRUCTURED`	JSON per page	Extract fields from multi-section documents	`dict`

Accessing Results:

The result field in PageResult can be either:

String (str) - for markdown format, contains the extracted text
Dictionary (dict) - for structured formats, contains the parsed JSON data

# Handle both result types
for page in result.pages:
    if isinstance(page.result, dict):
        # Structured data - already parsed
        print(f"Invoice #: {page.result.get('invoice_number')}")
    else:
        # Markdown text
        print(f"Text: {page.result[:100]}...")

Manual Job Management

async def manual_polling():
    async with LeapOCR("your-api-key") as client:
        # Start processing
        job = await client.ocr.process_url(
            "https://example.com/document.pdf",
            options=ProcessOptions(format=Format.MARKDOWN),
        )

        print(f"Job created: {job.job_id}")

        # Poll for status
        import asyncio
        while True:
            status = await client.ocr.get_job_status(job.job_id)
            print(f"Status: {status.status.value} - {status.progress:.1f}%")

            if status.status.value == "completed":
                break

            await asyncio.sleep(2)

        # Get results
        result = await client.ocr.get_results(job.job_id)
        print(f"Processing complete: {len(result.pages)} pages")

asyncio.run(manual_polling())

Progress Tracking

from leapocr import PollOptions

async def track_progress():
    def progress_callback(status):
        print(f"Progress: {status.progress:.1f}% "
              f"({status.processed_pages}/{status.total_pages} pages)")

    async with LeapOCR("your-api-key") as client:
        # Submit job
        job = await client.ocr.process_file("large-document.pdf")

        # Wait with progress tracking
        result = await client.ocr.wait_until_done(
            job.job_id,
            poll_options=PollOptions(
                poll_interval=2.0,
                max_wait=300.0,
                on_progress=progress_callback,
            ),
        )

asyncio.run(track_progress())

Using Templates

Use pre-configured templates for common document types. Templates include predefined schemas, instructions, and model settings:

async def use_template():
    async with LeapOCR("your-api-key") as client:
        # Use a pre-configured template by its slug
        job = await client.ocr.process_file(
            "invoice.pdf",
            options=ProcessOptions(
                template_slug="invoice-extraction",  # Reference existing template
            ),
        )

        result = await client.ocr.wait_until_done(job.job_id)

        # Access result (dict for structured templates, string for markdown)
        print(f"Extracted data: {result.pages[0].result}")

        await client.ocr.delete_job(job.job_id)

asyncio.run(use_template())

Common Template Use Cases:

Invoice extraction
Receipt processing
Medical records
Identity documents
Custom organizational templates

Deleting Jobs

Automatic Deletion: All jobs and their associated files are automatically deleted after 7 days for security and storage management.

Manual Deletion: Delete jobs immediately after processing to remove sensitive data or free up resources sooner:

async def delete_after_processing():
    async with LeapOCR("your-api-key") as client:
        # Submit and process document
        job = await client.ocr.process_file("sensitive-doc.pdf")
        result = await client.ocr.wait_until_done(job.job_id)

        # Use the extracted data
        print(f"Extracted data: {result.pages[0].result}")

        # Delete job immediately (redacts content and marks as deleted)
        await client.ocr.delete_job(result.job_id)
        print("Job deleted successfully")

asyncio.run(delete_after_processing())

Best Practices:

Sensitive Data: Delete jobs containing PII/PHI immediately after use
Compliance: Manual deletion helps meet data retention requirements
Auto-Cleanup: All jobs are automatically purged after 7 days regardless
Irreversible: Deleted jobs cannot be recovered

Concurrent Batch Processing

async def batch_process():
    urls = [
        "https://example.com/doc1.pdf",
        "https://example.com/doc2.pdf",
        "https://example.com/doc3.pdf",
    ]

    async with LeapOCR("your-api-key") as client:
        # Submit all documents concurrently
        jobs = await asyncio.gather(*[
            client.ocr.process_url(url)
            for url in urls
        ])

        # Wait for all to complete
        results = await asyncio.gather(*[
            client.ocr.wait_until_done(job.job_id)
            for job in jobs
        ])

        total_credits = sum(r.credits_used for r in results)
        total_pages = sum(len(r.pages) for r in results)

        print(f"Processed {len(results)} documents")
        print(f"Total credits: {total_credits}")
        print(f"Total pages: {total_pages}")

        await asyncio.gather(*[
            client.ocr.delete_job(job.job_id)
            for job in jobs
        ])

asyncio.run(batch_process())

For more examples, see the examples/ directory.

Configuration

Custom Configuration

from leapocr import LeapOCR, ClientConfig

config = ClientConfig(
    base_url="https://api.leapocr.com/api/v1",
    timeout=120.0,
    max_retries=5,
    retry_delay=2.0,
    retry_multiplier=2.0,
)

async with LeapOCR("your-api-key", config) as client:
    # Use client with custom configuration
    pass

Environment Variables

export LEAPOCR_API_KEY="your-api-key"
export OCR_BASE_URL="https://api.leapocr.com/api/v1"  # optional

Error Handling

The SDK provides a comprehensive error hierarchy for robust error handling:

from leapocr import (
    LeapOCRError,
    AuthenticationError,
    ValidationError,
    APIError,
    JobError,
    NetworkError,
)

async def handle_errors():
    try:
        async with LeapOCR("your-api-key") as client:
            result = await client.ocr.process_and_wait("document.pdf")

    except AuthenticationError as e:
        print(f"Authentication failed: {e.message}")

    except ValidationError as e:
        print(f"Invalid input: {e.message}")
        if e.field:
            print(f"  Field: {e.field}")

    except JobError as e:
        print(f"Job failed: {e.message}")
        print(f"  Job ID: {e.job_id}")

    except NetworkError as e:
        print(f"Network error: {e.message}")
        # Implement retry logic

    except APIError as e:
        print(f"API error: {e.message}")
        print(f"  Status code: {e.status_code}")

    except LeapOCRError as e:
        print(f"SDK error: {e.message}")

asyncio.run(handle_errors())

Error Types

LeapOCRError - Base class for all SDK errors
AuthenticationError - Invalid or missing API key
RateLimitError - Rate limit exceeded
ValidationError - Invalid input parameters
FileError - File-related errors (not found, too large, etc.)
JobError - Job processing errors
JobFailedError - Job processing failed
JobTimeoutError - Job processing timed out
NetworkError - Network connectivity issues
APIError - API returned an error response
InsufficientCreditsError - Not enough credits

API Reference

Core Classes

`LeapOCR`

Main client class for interacting with the LeapOCR API.

class LeapOCR:
    def __init__(self, api_key: str, config: ClientConfig | None = None)
    async def close(self) -> None
    async def health(self) -> bool

    # Use as async context manager
    async with LeapOCR(api_key) as client:
        ...

`OCRService`

OCR operations service accessible via client.ocr.

# Process file or URL (submit job)
async def process_file(
    file: str | Path | BinaryIO,
    options: ProcessOptions | None = None,
) -> ProcessResult

async def process_url(
    url: str,
    options: ProcessOptions | None = None,
) -> ProcessResult

# Wait for job completion
async def wait_until_done(
    job_id: str,
    poll_options: PollOptions | None = None,
) -> JobResult

# Job management
async def get_job_status(job_id: str) -> JobStatus
async def get_results(job_id: str, page: int = 1, limit: int = 100) -> JobResult
async def delete_job(job_id: str) -> dict[str, Any]

Data Models

`ProcessOptions`

@dataclass
class ProcessOptions:
    format: Format = Format.STRUCTURED
    model: Model | None = None
    schema: dict[str, Any] | None = None
    instructions: str | None = None
    template_slug: str | None = None  # Use existing template by slug
    metadata: dict[str, str] = field(default_factory=dict)

`PollOptions`

@dataclass
class PollOptions:
    poll_interval: float = 2.0  # seconds
    max_wait: float = 300.0  # seconds (5 minutes)
    on_progress: Callable[[JobStatus], None] | None = None

`ClientConfig`

@dataclass
class ClientConfig:
    base_url: str = "https://api.leapocr.com/api/v1"
    timeout: float = 30.0
    max_retries: int = 3
    retry_delay: float = 1.0
    retry_multiplier: float = 2.0

`JobResult`

@dataclass
class JobResult:
    job_id: str
    status: JobStatusType
    pages: list[PageResult]
    file_name: str
    total_pages: int
    processed_pages: int
    credits_used: int
    model: str
    result_format: str
    completed_at: datetime
    pagination: PaginationInfo | None = None

`PageResult`

@dataclass
class PageResult:
    page_number: int
    result: str | dict[str, Any]  # String for markdown, dict for structured
    id: str | None = None

Note: The result field type depends on the format:

Format.MARKDOWN → str (plain text)
Format.STRUCTURED → dict (parsed JSON object)
Format.PER_PAGE_STRUCTURED → dict (parsed JSON object)

Development

Prerequisites

Python 3.9+
uv (recommended) or pip
OpenAPI Generator (for code generation)
Java 21+ (for OpenAPI Generator)

Setup

# Clone the repository
git clone https://github.com/leapocr/leapocr-python.git
cd leapocr-python

# Install dependencies
make dev-install

# Or using uv directly
uv sync

Common Tasks

make test               # Run unit tests
make test-cov           # Run tests with coverage
make test-integration   # Run integration tests (requires API key)
make lint               # Run linters
make format             # Format code
make type-check         # Run type checking
make check              # Run all checks (format, lint, type-check)

Code Generation

The SDK includes generated client code from the OpenAPI specification:

make fetch-spec         # Download OpenAPI spec
make generate           # Generate client code
make regenerate         # Fetch + generate (full refresh)

Running Tests

# Unit tests only
pytest tests/unit/

# Integration tests (requires API key)
export LEAPOCR_API_KEY="your-api-key"
pytest tests/integration/

# All tests with coverage
pytest tests/ --cov=leapocr --cov-report=html

Project Structure

leapocr-python/
├── leapocr/               # Main package
│   ├── __init__.py        # Public API
│   ├── client.py          # Main client class
│   ├── ocr.py             # OCR service
│   ├── models.py          # Data models
│   ├── errors.py          # Error classes
│   ├── config.py          # Configuration
│   ├── _internal/         # Internal utilities
│   │   ├── retry.py       # Retry logic
│   │   ├── upload.py      # File upload
│   │   ├── polling.py     # Status polling
│   │   ├── validation.py  # Input validation
│   │   └── utils.py       # Common utilities
│   └── generated/         # Generated OpenAPI client
├── tests/
│   ├── unit/              # Unit tests
│   └── integration/       # Integration tests
├── examples/              # Usage examples
├── scripts/               # Development scripts
└── Makefile               # Common development tasks

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Standards

Follow PEP 8 style guide
Add type hints for all functions
Write docstrings for public APIs
Add tests for new features
Update documentation as needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support & Resources

Documentation: docs.leapocr.com
API Reference: API Documentation
Issues: GitHub Issues
Examples: examples/
Website: leapocr.com
Support: support@leapocr.com

Version: 0.0.4

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

valium3

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1

Mar 22, 2026

2.0.0

Mar 11, 2026

This version

0.0.4

Nov 23, 2025

0.0.3

Nov 11, 2025

0.0.2

Nov 11, 2025

0.0.1

Nov 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leapocr-0.0.4.tar.gz (870.8 kB view details)

Uploaded Nov 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

leapocr-0.0.4-py3-none-any.whl (174.0 kB view details)

Uploaded Nov 23, 2025 Python 3

File details

Details for the file leapocr-0.0.4.tar.gz.

File metadata

Download URL: leapocr-0.0.4.tar.gz
Upload date: Nov 23, 2025
Size: 870.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for leapocr-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`bc0f9d07acc86b54034413438ad82567ba7d33e18b65f11bece530faf09b1b82`
MD5	`9f0919fa388fc70af3b4846f5491beb5`
BLAKE2b-256	`bc650c3847545df1d56e3dd7e7bf46dd6106bd4127dcccabd84b937ea67b7e46`

See more details on using hashes here.

Provenance

The following attestation bundles were made for leapocr-0.0.4.tar.gz:

Publisher: release.yml on LeapOCR/leapocr-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: leapocr-0.0.4.tar.gz
- Subject digest: bc0f9d07acc86b54034413438ad82567ba7d33e18b65f11bece530faf09b1b82
- Sigstore transparency entry: 718183081
- Sigstore integration time: Nov 23, 2025
Source repository:
- Permalink: LeapOCR/leapocr-python@532d34e576d151923e622af72bba39c145ee9e5e
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/LeapOCR
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@532d34e576d151923e622af72bba39c145ee9e5e
- Trigger Event: push

File details

Details for the file leapocr-0.0.4-py3-none-any.whl.

File metadata

Download URL: leapocr-0.0.4-py3-none-any.whl
Upload date: Nov 23, 2025
Size: 174.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for leapocr-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa83832fc4c95fae82a798c447d6e8e1158c4088a40d0adf75ce70d31b245402`
MD5	`42f5d08ac926a88ea997d28ad04903f4`
BLAKE2b-256	`c071fafef03f976324acb041f2b9305b9261d1ca182f5de94297ea646af27a23`

See more details on using hashes here.

Provenance

The following attestation bundles were made for leapocr-0.0.4-py3-none-any.whl:

Publisher: release.yml on LeapOCR/leapocr-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: leapocr-0.0.4-py3-none-any.whl
- Subject digest: fa83832fc4c95fae82a798c447d6e8e1158c4088a40d0adf75ce70d31b245402
- Sigstore transparency entry: 718183123
- Sigstore integration time: Nov 23, 2025
Source repository:
- Permalink: LeapOCR/leapocr-python@532d34e576d151923e622af72bba39c145ee9e5e
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/LeapOCR
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@532d34e576d151923e622af72bba39c145ee9e5e
- Trigger Event: push

leapocr 0.0.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LeapOCR Python SDK

Overview

Installation

Quick Start

Prerequisites

Basic Example

Key Features

Processing Models

Usage Examples

Processing from URL

Processing Local Files

Custom Schema Extraction

Output Formats

Manual Job Management

Progress Tracking

Using Templates

Deleting Jobs

Concurrent Batch Processing

Configuration

Custom Configuration

Environment Variables

Error Handling

Error Types

API Reference

Core Classes

LeapOCR

OCRService

Data Models

ProcessOptions

PollOptions

ClientConfig

JobResult

PageResult

Development

Prerequisites

Setup

Common Tasks

Code Generation

Running Tests

Project Structure

Contributing

Code Standards

License

Support & Resources

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`LeapOCR`

`OCRService`

`ProcessOptions`

`PollOptions`

`ClientConfig`

`JobResult`

`PageResult`