Synchronous OCR using Gemini Vision API - A rewrite of pyzerox without async/litellm
Project description
Zerox Sync
A synchronous Python library for OCR and document extraction using Google's Gemini Vision API. This is a rewrite of pyzerox that removes async wrappers and replaces litellm with direct Gemini API integration.
Features
- Synchronous API: No async/await complexity, simple function calls
- Direct Gemini Integration: Uses Google's Gemini API directly without litellm dependency
- PDF to Markdown: Convert PDFs to structured markdown using vision models
- Concurrent Processing: Process multiple pages in parallel using ThreadPoolExecutor
- Selective Page Processing: Extract specific pages from PDFs
- Format Consistency: Maintain formatting across pages
- Simple Setup: Just set GOOGLE_API_KEY and go
Installation
Using uv (Recommended)
uv is a fast Python package installer:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install zerox-sync
uv pip install zerox-sync
Using pip
pip install zerox-sync
System Dependencies
You'll need poppler installed for PDF processing:
macOS:
brew install poppler
Ubuntu/Debian:
sudo apt-get install poppler-utils
Windows: Download and install from poppler releases
Quick Start
from zerox_sync import zerox
import os
# Set your Gemini API key
os.environ["GOOGLE_API_KEY"] = "your-api-key-here"
# Process a PDF
result = zerox(
file_path="path/to/document.pdf",
model="gemini-3-pro",
)
# Access the results
for page in result.pages:
print(f"Page {page.page}:")
print(page.content)
print(f"Length: {page.content_length} chars\n")
print(f"Total time: {result.completion_time}ms")
print(f"Input tokens: {result.input_tokens}")
print(f"Output tokens: {result.output_tokens}")
API Reference
zerox()
Main function to perform OCR on a PDF document.
def zerox(
cleanup: bool = True,
concurrency: int = 10,
file_path: str = "",
image_density: int = 300,
image_height: tuple = (None, 1056),
maintain_format: bool = False,
model: str = "gemini-3-pro",
output_dir: Optional[str] = None,
temp_dir: Optional[str] = None,
custom_system_prompt: Optional[str] = None,
select_pages: Optional[Union[int, List[int]]] = None,
**kwargs
) -> ZeroxOutput:
Parameters:
cleanup(bool): Whether to cleanup temporary files after processing (default: True)concurrency(int): Number of concurrent threads for page processing (default: 10)file_path(str): Path or URL to the PDF fileimage_density(int): DPI for PDF to image conversion (default: 300)image_height(tuple): Image dimensions as (width, height) (default: (None, 1056))maintain_format(bool): Maintain consistent formatting across pages (default: False)model(str): Gemini model to use (default: "gemini-3-pro")output_dir(Optional[str]): Directory to save markdown output (default: None)temp_dir(Optional[str]): Directory for temporary files (default: system temp)custom_system_prompt(Optional[str]): Override default system prompt (default: None)select_pages(Optional[Union[int, List[int]]]): Specific pages to process (default: None)**kwargs: Additional arguments passed to Gemini API
Returns:
ZeroxOutput object with:
completion_time(float): Processing time in millisecondsfile_name(str): Processed file nameinput_tokens(int): Number of input tokens usedoutput_tokens(int): Number of output tokens generatedpages(List[Page]): List of Page objects containing:content(str): Markdown contentpage(int): Page numbercontent_length(int): Content length in characters
Advanced Usage
Process Specific Pages
result = zerox(
file_path="document.pdf",
select_pages=[1, 3, 5], # Only process pages 1, 3, and 5
)
Maintain Format Consistency
result = zerox(
file_path="document.pdf",
maintain_format=True, # Process pages sequentially to maintain formatting
)
Save to File
result = zerox(
file_path="document.pdf",
output_dir="./output", # Markdown saved to ./output/{filename}.md
)
Custom System Prompt
result = zerox(
file_path="document.pdf",
custom_system_prompt="Extract only tables from this document in markdown format.",
)
Process from URL
result = zerox(
file_path="https://example.com/document.pdf",
)
Adjust Concurrency
result = zerox(
file_path="document.pdf",
concurrency=5, # Process 5 pages concurrently (default: 10)
)
Available Models
Zerox Sync supports various Gemini models:
gemini-3-pro(default): Most intelligent modelgemini-3-flash-preview: Fast with frontier-class performancegemini-2.5-pro: Powerful reasoning modelgemini-2.5-flash: Balanced model with 1M token contextgemini-2.5-flash-lite: Fastest and most cost-efficient
Environment Variables
GOOGLE_API_KEY: Your Google AI Studio API key (required)- Get your key from: https://aistudio.google.com/apikey
Differences from pyzerox
- Synchronous: No
async/await- uses standard function calls - Gemini Direct: Direct Gemini API integration instead of litellm
- Simple Dependencies: Fewer dependencies, no aiofiles/aiohttp/aioshutil
- ThreadPoolExecutor: Uses standard library threading instead of asyncio
- Requests: Uses requests library for HTTP instead of aiohttp
Error Handling
from zerox_sync import zerox
from zerox_sync.errors import (
FileUnavailable,
MissingEnvironmentVariables,
ResourceUnreachableException,
PageNumberOutOfBoundError,
)
try:
result = zerox(file_path="document.pdf")
except MissingEnvironmentVariables:
print("Please set GOOGLE_API_KEY environment variable")
except FileUnavailable:
print("File not found or invalid path")
except ResourceUnreachableException:
print("Could not download file from URL")
except PageNumberOutOfBoundError:
print("Invalid page numbers specified")
Development
Setup
# Clone the repository
git clone https://github.com/yourusername/zerox-sync.git
cd zerox-sync
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install with dev dependencies
uv pip install -e ".[dev]"
Running Tests
pytest
Code Formatting
# Format code
black zerox_sync tests
# Lint
ruff check zerox_sync tests
License
MIT License - see LICENSE file for details
Credits
This project is a synchronous rewrite of pyzerox by the getomni-ai team. The original project is an excellent async implementation with litellm support.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zerox_sync-0.1.0.tar.gz.
File metadata
- Download URL: zerox_sync-0.1.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
220f110be389952411616701cf3b4f7921cbd9b2418cfb09050569ef0063ce0b
|
|
| MD5 |
eb69334922a98bc9388dc72c6bba8e92
|
|
| BLAKE2b-256 |
28fdf08ba8afab165ffbb3f46ad9d0aab365a65a200cc4e60b5982cd2b2ff02f
|
File details
Details for the file zerox_sync-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zerox_sync-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c93ccf8e747c413d3c7ecb2a5ff5ee8882881997761a76afd045a89d56d01ae6
|
|
| MD5 |
ea46e3c9a628103cbaa91e0ecd877d20
|
|
| BLAKE2b-256 |
9f5f976b1dd3854ebee8fec9f6115de99f339e0410875366c534d9c16c4dced8
|