Skip to main content

Synchronous OCR using Gemini Vision API - A rewrite of pyzerox without async/litellm

Project description

Zerox Sync

A synchronous Python library for OCR and document extraction using Google's Gemini Vision API. This is a rewrite of pyzerox that removes async wrappers and replaces litellm with direct Gemini API integration.

Features

  • Synchronous API: No async/await complexity, simple function calls
  • Direct Gemini Integration: Uses Google's Gemini API directly without litellm dependency
  • PDF to Markdown: Convert PDFs to structured markdown using vision models
  • Concurrent Processing: Process multiple pages in parallel using ThreadPoolExecutor
  • Selective Page Processing: Extract specific pages from PDFs
  • Format Consistency: Maintain formatting across pages
  • Simple Setup: Just set GEMINI_API_KEY and go

Installation

Using uv (Recommended)

uv is a fast Python package installer:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add zerox-sync to your project
uv add zerox-sync

Using pip

pip install zerox-sync

System Dependencies

You'll need poppler installed for PDF processing:

macOS:

brew install poppler

Ubuntu/Debian:

sudo apt-get install poppler-utils

Windows: Download and install from poppler releases

Quick Start

from zerox_sync import zerox
import os

# Set your Gemini API key
os.environ["GEMINI_API_KEY"] = "your-api-key-here"

# Process a PDF
result = zerox(
    file_path="path/to/document.pdf",
    model="gemini-2.5-flash",
)

# Access the results
for page in result.pages:
    print(f"Page {page.page}:")
    print(page.content)
    print(f"Length: {page.content_length} chars\n")

print(f"Total time: {result.completion_time}ms")
print(f"Input tokens: {result.input_tokens}")
print(f"Output tokens: {result.output_tokens}")

API Reference

zerox()

Main function to perform OCR on a PDF document.

def zerox(
    cleanup: bool = True,
    concurrency: int = 10,
    file_path: str = "",
    image_density: int = 300,
    image_height: tuple = (None, 1056),
    maintain_format: bool = False,
    model: str = "gemini-3-pro",
    output_dir: Optional[str] = None,
    temp_dir: Optional[str] = None,
    custom_system_prompt: Optional[str] = None,
    select_pages: Optional[Union[int, List[int]]] = None,
    **kwargs
) -> ZeroxOutput:

Parameters:

  • cleanup (bool): Whether to cleanup temporary files after processing (default: True)
  • concurrency (int): Number of concurrent threads for page processing (default: 10)
  • file_path (str): Path or URL to the PDF file
  • image_density (int): DPI for PDF to image conversion (default: 300)
  • image_height (tuple): Image dimensions as (width, height) (default: (None, 1056))
  • maintain_format (bool): Maintain consistent formatting across pages (default: False)
  • model (str): Gemini model to use (default: "gemini-3-pro")
  • output_dir (Optional[str]): Directory to save markdown output (default: None)
  • temp_dir (Optional[str]): Directory for temporary files (default: system temp)
  • custom_system_prompt (Optional[str]): Override default system prompt (default: None)
  • select_pages (Optional[Union[int, List[int]]]): Specific pages to process (default: None)
  • **kwargs: Additional arguments passed to Gemini API

Returns:

ZeroxOutput object with:

  • completion_time (float): Processing time in milliseconds
  • file_name (str): Processed file name
  • input_tokens (int): Number of input tokens used
  • output_tokens (int): Number of output tokens generated
  • pages (List[Page]): List of Page objects containing:
    • content (str): Markdown content
    • page (int): Page number
    • content_length (int): Content length in characters

Advanced Usage

Process Specific Pages

result = zerox(
    file_path="document.pdf",
    select_pages=[1, 3, 5],  # Only process pages 1, 3, and 5
)

Maintain Format Consistency

result = zerox(
    file_path="document.pdf",
    maintain_format=True,  # Process pages sequentially to maintain formatting
)

Save to File

result = zerox(
    file_path="document.pdf",
    output_dir="./output",  # Markdown saved to ./output/{filename}.md
)

Custom System Prompt

result = zerox(
    file_path="document.pdf",
    custom_system_prompt="Extract only tables from this document in markdown format.",
)

Process from URL

result = zerox(
    file_path="https://example.com/document.pdf",
)

Adjust Concurrency

result = zerox(
    file_path="document.pdf",
    concurrency=5,  # Process 5 pages concurrently (default: 10)
)

Available Models

Zerox Sync supports various Gemini models:

  • gemini-2.5-flash (default): Production-ready, price-performance optimized
  • gemini-2.0-flash: Stable model with 1M token context window
  • gemini-3-pro-preview: Latest preview, best multimodal understanding
  • gemini-3-flash-preview: Latest preview, intelligent and fast
  • gemini-2.0-flash-lite: Most cost-efficient option

See Gemini API Models for the complete list.

Environment Variables

Differences from pyzerox

  1. Synchronous: No async/await - uses standard function calls
  2. Gemini Direct: Direct Gemini API integration instead of litellm
  3. Simple Dependencies: Fewer dependencies, no aiofiles/aiohttp/aioshutil
  4. ThreadPoolExecutor: Uses standard library threading instead of asyncio
  5. Requests: Uses requests library for HTTP instead of aiohttp

Error Handling

from zerox_sync import zerox
from zerox_sync.errors import (
    FileUnavailable,
    MissingEnvironmentVariables,
    ResourceUnreachableException,
    PageNumberOutOfBoundError,
)

try:
    result = zerox(file_path="document.pdf")
except MissingEnvironmentVariables:
    print("Please set GEMINI_API_KEY environment variable")
except FileUnavailable:
    print("File not found or invalid path")
except ResourceUnreachableException:
    print("Could not download file from URL")
except PageNumberOutOfBoundError:
    print("Invalid page numbers specified")

Development

Setup

# Clone the repository
git clone https://github.com/yourusername/zerox-sync.git
cd zerox-sync

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync all dependencies (including dev)
uv sync

Running Tests

# Run all unit tests (no API key required)
uv run pytest

# Run with coverage
uv run pytest --cov=zerox_sync --cov-report=term-missing

# Run live integration tests (requires GEMINI_API_KEY)
export GEMINI_API_KEY="your-api-key-here"
uv run pytest tests/test_live_integration.py -v

# Run all tests including live integration
export GEMINI_API_KEY="your-api-key-here"
uv run pytest -v

Note: The live integration tests in test_live_integration.py make actual API calls to Gemini and are automatically skipped if GEMINI_API_KEY is not set. They validate that the library works correctly with real API responses.

Code Formatting

# Format code
black zerox_sync tests

# Lint
ruff check zerox_sync tests

Building for PyPI Distribution

Prerequisites

Install build tools:

uv add --dev build twine

Complete Release Workflow

1. Update version in pyproject.toml:

[project]
version = "0.1.1"  # Bump this version

2. Commit changes:

git add pyproject.toml
git commit -m "Bump version to 0.1.1"
git tag v0.1.1

3. Clean and build:

# Clean old builds
rm -rf dist/ build/ *.egg-info

# Build distribution files
python -m build

This creates:

  • dist/zerox_sync-0.1.1-py3-none-any.whl (wheel)
  • dist/zerox_sync-0.1.1.tar.gz (source)

4. Validate:

python -m twine check dist/*

5. Upload to PyPI:

# Test on TestPyPI first (optional)
python -m twine upload --repository testpypi dist/*

# Upload to production PyPI
python -m twine upload dist/*

PyPI Credentials Setup

Create ~/.pypirc:

[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE

[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__
password = pypi-YOUR_TESTPYPI_TOKEN_HERE

Get API tokens:

Semantic Versioning

  • Major (1.0.0): Breaking changes
  • Minor (0.1.0): New features, backward compatible
  • Patch (0.0.1): Bug fixes, backward compatible

Quick Reference

# Full release workflow
rm -rf dist/ build/ *.egg-info   # Clean
python -m build                   # Build
python -m twine check dist/*      # Validate
python -m twine upload dist/*     # Upload to PyPI
git push --tags                   # Push version tag

License

MIT License - see LICENSE file for details

Credits

This project is a synchronous rewrite of pyzerox by the getomni-ai team. The original project is an excellent async implementation with litellm support.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerox_sync-0.2.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zerox_sync-0.2.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file zerox_sync-0.2.0.tar.gz.

File metadata

  • Download URL: zerox_sync-0.2.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for zerox_sync-0.2.0.tar.gz
Algorithm Hash digest
SHA256 42820a6458f0797738a68ba6e5d7f4c6d86944c1abf52e2016664d252aa673bd
MD5 76dc82ed8f008dd88775ab0dbad2ac0d
BLAKE2b-256 f62c920e06a43e3cede0f4ea1bde573e292d3c7984e2f32fd0bb52682d57ea4b

See more details on using hashes here.

File details

Details for the file zerox_sync-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: zerox_sync-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for zerox_sync-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd9fe97655ef4ae7fb8e564ba72403b662698f3b4e17ebe9b103b00ce1dc65cb
MD5 c6e69dd9a2439b8add974eff721ae07d
BLAKE2b-256 235ae8c6ab356735c89111be0a3ed648e0f6936a3e530f5ddb4b5b59a2d2e072

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page