CLI tool for OCR processing using Google Gemini's vision capabilities
Project description
Gemini OCR CLI
A command-line tool for OCR processing using Google Gemini's vision capabilities. Process PDFs and images to extract text, tables, equations, and figures.
Installation
Requires Python 3.11+ and a Google Gemini API key.
pip install gemini-ocr-cli
Or from source:
git clone https://github.com/r-uben/gemini-ocr-cli.git
cd gemini-ocr-cli
uv sync
Quick start
# Set your API key
export GEMINI_API_KEY="your_key_here"
# Process a single file
gemini-ocr document.pdf
# Process a directory
gemini-ocr ./documents -o ./results
# Preview what would be processed (no API calls)
gemini-ocr ./documents --dry-run
# Process 4 files concurrently
gemini-ocr ./documents -w 4
Options
Usage: gemini-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory (default: <input_dir>/gemini_ocr_output/)
--api-key TEXT Gemini API key (or set GEMINI_API_KEY env var)
--model TEXT Model to use (default: gemini-3.1-flash-lite-preview)
--task [convert|extract|table|describe_figure]
OCR task type (default: convert)
--prompt TEXT Custom prompt for OCR processing
--include-images/--no-images Extract embedded images (default: True)
--save-originals/--no-save-originals Copy original images to output (default: True)
-w, --workers N Concurrent workers for batch processing (default: 1)
--reprocess Reprocess already-processed files
--dry-run List files without calling the API
-q, --quiet Suppress all output except errors
-v, --verbose Enable verbose/debug output
--info Show configuration and system info
--env-file PATH Path to .env file
--version Show version
--help Show this message
Output structure
gemini_ocr_output/
├── document_name/
│ ├── document_name.md # OCR markdown (clean text only)
│ └── figures/ # extracted embedded images
│ ├── page1_img1.png
│ └── page2_img1.png
├── another_document/
│ └── ...
└── metadata.json # processing stats, checksums, file list
API key resolution
Priority order:
--api-keyCLI argumentGEMINI_API_KEYenvironment variableGOOGLE_API_KEYenvironment variable (fallback).envfile in current directory
Configuration
All CLI options can also be set via environment variables or a .env file:
| CLI flag | Environment variable | Default |
|---|---|---|
--api-key |
GEMINI_API_KEY |
(required) |
--model |
GEMINI_MODEL |
gemini-3.1-flash-lite-preview |
--include-images |
GEMINI_INCLUDE_IMAGES |
true |
--save-originals |
GEMINI_SAVE_ORIGINAL_IMAGES |
true |
--workers |
GEMINI_MAX_WORKERS |
1 |
--verbose |
GEMINI_VERBOSE |
false |
GEMINI_MAX_FILE_SIZE_MB |
50 |
|
GEMINI_MAX_RETRIES |
3 |
|
GEMINI_RETRY_BASE_DELAY |
1.0 |
CLI flags override environment variables when explicitly passed.
Development
# Install dev dependencies
uv sync --extra dev
# Run tests
uv run pytest
# Lint
uv run ruff check .
# Format
uv run ruff format .
# Type check
uv run mypy gemini_ocr/ --ignore-missing-imports
Limitations
- Maximum file size: 50 MB (configurable via
GEMINI_MAX_FILE_SIZE_MB) - Supported formats: PDF, JPG, JPEG, PNG, WEBP, GIF, BMP, TIFF
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_ocr_cli-0.3.0.tar.gz.
File metadata
- Download URL: gemini_ocr_cli-0.3.0.tar.gz
- Upload date:
- Size: 81.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0e666e3ae3c079c8de427bc2e291097a7468e1c4b5bc63e81de0e0ff2f1d2c9
|
|
| MD5 |
cf6852bee6ff053d77c9f750f16a5021
|
|
| BLAKE2b-256 |
05b28013d68e3233035f4e503fb296dacd78cb122ee830f45c15f9b48f9e9e11
|
File details
Details for the file gemini_ocr_cli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: gemini_ocr_cli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7950e7a25273f23f949375df069983c93e29571d0e30a4927ef4a72984e6ab7c
|
|
| MD5 |
8c4cebd6f1b81203c9ea6eadbae90c89
|
|
| BLAKE2b-256 |
9694dfe84c575a565460b56f8a6cb3b19b5cec2d9f1a3e9a7d6fb49f0603fca6
|