CLI tool for OCR processing using Google Gemini's vision capabilities

These details have not been verified by PyPI

Project links

Project description

Gemini OCR CLI

Command-line tool for OCR processing using Google Gemini's vision capabilities. Extract text, tables, equations, and figures from PDFs and images with high accuracy.

Features

Native PDF upload: Direct PDF processing via Gemini Files API (fast, single API call)
Multi-format support: PDF and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
High-quality OCR: Leverages Gemini's advanced vision models
Structure preservation: Maintains headings, tables, lists, equations
Figure analysis: Generate detailed descriptions of charts and diagrams
Batch processing: Process entire directories with progress tracking
Incremental processing: Skip already-processed files
Automatic retry: Exponential backoff for API rate limits
Markdown output: Clean, structured output format

Installation

From PyPI (recommended)

pip install gemini-ocr-cli

Using pipx

pipx install gemini-ocr-cli

From source

git clone https://github.com/r-uben/gemini-ocr-cli.git
cd gemini-ocr-cli
uv pip install -e .

Quick Start

API Key Resolution

The CLI automatically picks up your API key from environment variables (no configuration needed if already set):

Priority order:

--api-key CLI argument (highest priority)
GEMINI_API_KEY environment variable
GOOGLE_API_KEY environment variable (fallback)
.env file in current directory

# Option 1: Set environment variable (recommended)
export GEMINI_API_KEY="your-api-key"

# Option 2: Use existing GOOGLE_API_KEY (auto-detected)
export GOOGLE_API_KEY="your-api-key"

# Option 3: Create a .env file
echo "GEMINI_API_KEY=your-api-key" > .env

# Option 4: Pass directly (not recommended for security)
gemini-ocr paper.pdf --api-key "your-api-key"

Process documents

# Single file
gemini-ocr paper.pdf

# Directory
gemini-ocr ./documents/ -o ./results/

# With custom model
gemini-ocr paper.pdf --model gemini-1.5-pro

Describe figures

# Analyze a chart/diagram
gemini-ocr describe chart.png

# Save to file
gemini-ocr describe figure.jpg -o description.md

CLI Reference

`gemini-ocr process`

Process documents and images with OCR.

Usage: gemini-ocr process [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory for results
  --api-key TEXT                  Gemini API key
  --model TEXT                    Model to use (default: gemini-3.0-flash)
  --task [convert|extract|table]  OCR task type (default: convert)
  --prompt TEXT                   Custom prompt for OCR
  --include-images/--no-images    Extract embedded images (default: True)
  --save-originals/--no-save-originals
                                  Save original input images (default: True)
  --add-timestamp/--no-timestamp  Add timestamp to output folder
  --reprocess                     Reprocess existing files
  --env-file PATH                 Path to .env file
  -v, --verbose                   Enable verbose output

`gemini-ocr describe`

Generate detailed descriptions of figures, charts, and diagrams.

Usage: gemini-ocr describe [OPTIONS] IMAGE_PATH

Options:
  --api-key TEXT    Gemini API key
  --model TEXT      Model to use
  -o, --output PATH Output file (default: stdout)

`gemini-ocr info`

Show configuration and system information.

Output Format

Results are saved as Markdown files with:

File metadata (original path, processing time)
Extracted text (full document)
Embedded image references (if enabled)
metadata.json tracking all processed files

Models

Model	Speed	Quality	Cost	Recommended For
`gemini-3.0-flash`	Fast	Good	Low	Default, most documents
`gemini-1.5-flash`	Fast	Good	Low	Simple documents
`gemini-1.5-pro`	Slower	Best	Higher	Complex layouts, equations

Environment Variables

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key	Required
`GOOGLE_API_KEY`	Fallback API key	-
`GEMINI_MODEL`	Default model	`gemini-3.0-flash`

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

Apr 13, 2026

0.3.1

Mar 14, 2026

0.3.0

Mar 12, 2026

This version

0.2.1

Dec 30, 2025

0.2.0

Dec 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_ocr_cli-0.2.1.tar.gz (93.1 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemini_ocr_cli-0.2.1-py3-none-any.whl (17.7 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file gemini_ocr_cli-0.2.1.tar.gz.

File metadata

Download URL: gemini_ocr_cli-0.2.1.tar.gz
Upload date: Dec 30, 2025
Size: 93.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gemini_ocr_cli-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`9d785f95ba7dec39c5795d48ab87bbd2010ff90ef39d8a9a9c90e294b5e7e703`
MD5	`3f91fb6e24df7f2b82eed81b5fa66da1`
BLAKE2b-256	`d1832cd3edc388388e38456b5325ce40232aa2e0b7725b1a7e9bab43519bb76f`

See more details on using hashes here.

File details

Details for the file gemini_ocr_cli-0.2.1-py3-none-any.whl.

File metadata

Download URL: gemini_ocr_cli-0.2.1-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 17.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gemini_ocr_cli-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aaa9d5cddad13f46df5aeba03e305d77db9eb1ecb848362798ab408d5188d2ba`
MD5	`4664b9984f826dc45c36924014859c50`
BLAKE2b-256	`af6d24e7454b2780ed16f8b5343170de09a154d97f40259c1596ce7287885964`

See more details on using hashes here.

gemini-ocr-cli 0.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Gemini OCR CLI

Features

Installation

From PyPI (recommended)

Using pipx

From source

Quick Start

API Key Resolution

Process documents

Describe figures

CLI Reference

gemini-ocr process

gemini-ocr describe

gemini-ocr info

Output Format

Models

Environment Variables

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`gemini-ocr process`

`gemini-ocr describe`

`gemini-ocr info`