CLI tool for OCR processing using Google Gemini's vision capabilities
Project description
Gemini OCR CLI
Command-line tool for OCR processing using Google Gemini's vision capabilities. Extract text, tables, equations, and figures from PDFs and images with high accuracy.
Features
- Native PDF upload: Direct PDF processing via Gemini Files API (fast, single API call)
- Multi-format support: PDF and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
- High-quality OCR: Leverages Gemini's advanced vision models
- Structure preservation: Maintains headings, tables, lists, equations
- Figure analysis: Generate detailed descriptions of charts and diagrams
- Batch processing: Process entire directories with progress tracking
- Incremental processing: Skip already-processed files
- Automatic retry: Exponential backoff for API rate limits
- Markdown output: Clean, structured output format
Installation
From PyPI (recommended)
pip install gemini-ocr-cli
Using pipx
pipx install gemini-ocr-cli
From source
git clone https://github.com/r-uben/gemini-ocr-cli.git
cd gemini-ocr-cli
uv pip install -e .
Quick Start
API Key Resolution
The CLI automatically picks up your API key from environment variables (no configuration needed if already set):
Priority order:
--api-keyCLI argument (highest priority)GEMINI_API_KEYenvironment variableGOOGLE_API_KEYenvironment variable (fallback).envfile in current directory
# Option 1: Set environment variable (recommended)
export GEMINI_API_KEY="your-api-key"
# Option 2: Use existing GOOGLE_API_KEY (auto-detected)
export GOOGLE_API_KEY="your-api-key"
# Option 3: Create a .env file
echo "GEMINI_API_KEY=your-api-key" > .env
# Option 4: Pass directly (not recommended for security)
gemini-ocr paper.pdf --api-key "your-api-key"
Process documents
# Single file
gemini-ocr paper.pdf
# Directory
gemini-ocr ./documents/ -o ./results/
# With custom model
gemini-ocr paper.pdf --model gemini-1.5-pro
Describe figures
# Analyze a chart/diagram
gemini-ocr describe chart.png
# Save to file
gemini-ocr describe figure.jpg -o description.md
CLI Reference
gemini-ocr process
Process documents and images with OCR.
Usage: gemini-ocr process [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
--api-key TEXT Gemini API key
--model TEXT Model to use (default: gemini-3.0-flash)
--task [convert|extract|table] OCR task type (default: convert)
--prompt TEXT Custom prompt for OCR
--include-images/--no-images Extract embedded images (default: True)
--save-originals/--no-save-originals
Save original input images (default: True)
--add-timestamp/--no-timestamp Add timestamp to output folder
--reprocess Reprocess existing files
--env-file PATH Path to .env file
-v, --verbose Enable verbose output
gemini-ocr describe
Generate detailed descriptions of figures, charts, and diagrams.
Usage: gemini-ocr describe [OPTIONS] IMAGE_PATH
Options:
--api-key TEXT Gemini API key
--model TEXT Model to use
-o, --output PATH Output file (default: stdout)
gemini-ocr info
Show configuration and system information.
Output Format
Results are saved as Markdown files with:
- File metadata (original path, processing time)
- Extracted text (full document)
- Embedded image references (if enabled)
metadata.jsontracking all processed files
Models
| Model | Speed | Quality | Cost | Recommended For |
|---|---|---|---|---|
gemini-3.0-flash |
Fast | Good | Low | Default, most documents |
gemini-1.5-flash |
Fast | Good | Low | Simple documents |
gemini-1.5-pro |
Slower | Best | Higher | Complex layouts, equations |
Environment Variables
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | Required |
GOOGLE_API_KEY |
Fallback API key | - |
GEMINI_MODEL |
Default model | gemini-3.0-flash |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_ocr_cli-0.2.0.tar.gz.
File metadata
- Download URL: gemini_ocr_cli-0.2.0.tar.gz
- Upload date:
- Size: 93.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d50886f17b1766711ca9dc899a95aed8e260b73c663eec84478d482db2b8412e
|
|
| MD5 |
9ee1dbeaababf828b682fe9c9c4d5fda
|
|
| BLAKE2b-256 |
1971782b117c034ec3dd8ef921cc30b87fd0c9c912b3de3fc61388fb2d136b53
|
File details
Details for the file gemini_ocr_cli-0.2.0-py3-none-any.whl.
File metadata
- Download URL: gemini_ocr_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b9ba60470a07b6c0f1e0f474e528186c5fc96fdb2db238e309091883037e6e6
|
|
| MD5 |
9d04223509bbf5f7282cf331d8c2beb5
|
|
| BLAKE2b-256 |
cbfa732a9e82a0e7931f642eb7f548f119ea0d34e9ca9fac40f58729cf4e8ba2
|