CLI tool for OCR processing using Google Gemini's vision capabilities

These details have not been verified by PyPI

Project links

Project description

Gemini OCR CLI

A command-line tool for OCR processing using Google Gemini's vision capabilities. Process PDFs and images to extract text, tables, equations, and figures.

Choosing an OCR tool

This is one of five OCR CLI tools with a shared design: clean Markdown output, batch processing, and figure extraction. Pick based on your constraints:

Tool	Engine	Runs	Cost	Best for
deepseek-ocr-cli	DeepSeek vision	Local (Ollama / vLLM)	Free	General-purpose local OCR with multi-backend flexibility
gemini-ocr-cli (this repo)	Google Gemini	Cloud API	Free tier / Pay-per-use	Fast cloud OCR with concurrent processing
marker-ocr-cli	Marker (Surya + Texify)	Local	Free	Academic papers with equations, tables, complex layouts
mistral-ocr-cli	Mistral OCR API	Cloud API	~$1/1k pages	Structured extraction (tables, headers, footers)
nougat-ocr-cli	Meta Nougat	Local (GPU)	Free	Academic papers, GPU-accelerated batch processing

Installation

Requires Python 3.11+ and a Google Gemini API key.

pip install gemini-ocr-cli

Or from source:

git clone https://github.com/r-uben/gemini-ocr-cli.git
cd gemini-ocr-cli
uv sync

Quick start

# Set your API key
export GEMINI_API_KEY="your_key_here"

# Process a single file
gemini-ocr document.pdf

# Process a directory
gemini-ocr ./documents -o ./results

# Preview what would be processed (no API calls)
gemini-ocr ./documents --dry-run

# Process 4 files concurrently
gemini-ocr ./documents -w 4

Options

Usage: gemini-ocr [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory (default: <input_dir>/gemini_ocr_output/)
  --api-key TEXT                  Gemini API key (or set GEMINI_API_KEY env var)
  --model TEXT                    Model to use (default: gemini-3-flash-preview)
  --task [convert|extract|table|describe_figure]
                                  OCR task type (default: convert)
  --prompt TEXT                   Custom prompt for OCR processing

  --include-images/--no-images    Extract embedded images (default: True)
  --save-originals/--no-save-originals  Copy original images to output (default: True)

  -w, --workers N                 Concurrent workers for batch processing (default: 1)
  --reprocess                     Reprocess already-processed files
  --dry-run                       List files without calling the API
  -q, --quiet                     Suppress all output except errors
  -v, --verbose                   Enable verbose/debug output
  --info                          Show configuration and system info
  --env-file PATH                 Path to .env file
  --version                       Show version
  --help                          Show this message

Output structure

gemini_ocr_output/
├── document_name/
│   ├── document_name.md        # OCR markdown (clean text only)
│   └── figures/                # extracted embedded images
│       ├── page1_img1.png
│       └── page2_img1.png
├── another_document/
│   └── ...
└── metadata.json               # processing stats, checksums, file list

API key resolution

Priority order:

--api-key CLI argument
GEMINI_API_KEY environment variable
GOOGLE_API_KEY environment variable (fallback)
.env file in current directory

Configuration

All CLI options can also be set via environment variables or a .env file:

CLI flag	Environment variable	Default
`--api-key`	`GEMINI_API_KEY`	(required)
`--model`	`GEMINI_MODEL`	`gemini-3-flash-preview`
`--include-images`	`GEMINI_INCLUDE_IMAGES`	`true`
`--save-originals`	`GEMINI_SAVE_ORIGINAL_IMAGES`	`true`
`--workers`	`GEMINI_MAX_WORKERS`	`1`
`--verbose`	`GEMINI_VERBOSE`	`false`
	`GEMINI_MAX_FILE_SIZE_MB`	`50`
	`GEMINI_MAX_RETRIES`	`3`
	`GEMINI_RETRY_BASE_DELAY`	`1.0`

CLI flags override environment variables when explicitly passed.

Development

# Install dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint
uv run ruff check .

# Format
uv run ruff format .

# Type check
uv run mypy gemini_ocr/ --ignore-missing-imports

Limitations

Maximum file size: 50 MB (configurable via GEMINI_MAX_FILE_SIZE_MB)
Supported formats: PDF, JPG, JPEG, PNG, WEBP, GIF, BMP, TIFF

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

Apr 13, 2026

This version

0.3.1

Mar 14, 2026

0.3.0

Mar 12, 2026

0.2.1

Dec 30, 2025

0.2.0

Dec 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_ocr_cli-0.3.1.tar.gz (82.6 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemini_ocr_cli-0.3.1-py3-none-any.whl (17.7 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file gemini_ocr_cli-0.3.1.tar.gz.

File metadata

Download URL: gemini_ocr_cli-0.3.1.tar.gz
Upload date: Mar 14, 2026
Size: 82.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for gemini_ocr_cli-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`80ca1e8ea746761a2d3b04fe3ee064522eef49bbc87d264b3fdf605d1a8a805e`
MD5	`0f413ece80848585ff1185a569b98769`
BLAKE2b-256	`e0cd7003c8d317245297716ec5e1a8c62878b66dbdf67269a1622e5d000d6129`

See more details on using hashes here.

File details

Details for the file gemini_ocr_cli-0.3.1-py3-none-any.whl.

File metadata

Download URL: gemini_ocr_cli-0.3.1-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 17.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for gemini_ocr_cli-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a541cec17edb78eae35fa517ce2015d884c1b7d4ab4de014df3b360c44ee6ee`
MD5	`06dce9a59b9a4ea76d1ebe7307a02d7c`
BLAKE2b-256	`a1a268caadb1cb46ed27ebf1638f4c759df8352311decfbd47f2b35c80af70b7`

See more details on using hashes here.

gemini-ocr-cli 0.3.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Gemini OCR CLI

Choosing an OCR tool

Installation

Quick start

Options

Output structure

API key resolution

Configuration

Development

Limitations

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes