Skip to main content

CLI tool for OCR using DeepSeek-OCR model via Ollama

Project description

DeepSeek OCR CLI

PyPI version Python 3.10+ License: MIT

Command-line tool for OCR using DeepSeek vision models. Supports Ollama (local) and vLLM (GPU server) backends.

Features

  • Multi-backend: Ollama (local, free) and vLLM (OpenAI-compatible API)
  • Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
  • Per-document output folders with figures
  • Batch processing with incremental resume (skips already-processed files)
  • Retry with exponential backoff for transient failures
  • Parallel page processing for faster PDF OCR
  • --dry-run to preview files before processing
  • Clean markdown output with HTML tables converted to markdown

Choosing an OCR tool

This is one of five OCR CLI tools with a shared design: clean Markdown output, batch processing, and figure extraction. Pick based on your constraints:

Tool Engine Runs Cost Best for
deepseek-ocr-cli (this repo) DeepSeek vision Local (Ollama / vLLM) Free General-purpose local OCR with multi-backend flexibility
gemini-ocr-cli Google Gemini Cloud API Free tier / Pay-per-use Fast cloud OCR with concurrent processing
marker-ocr-cli Marker (Surya + Texify) Local Free Academic papers with equations, tables, complex layouts
mistral-ocr-cli Mistral OCR API Cloud API ~$1/1k pages Structured extraction (tables, headers, footers)
nougat-ocr-cli Meta Nougat Local (GPU) Free Academic papers, GPU-accelerated batch processing

Requirements

  • Python 3.10+
  • Ollama installed and running (for Ollama backend)
  • deepseek-ocr model pulled in Ollama

Installation

1. Install Ollama

# macOS/Linux
brew install ollama

# Or download from https://ollama.ai

2. Pull the DeepSeek-OCR model

ollama pull deepseek-ocr

3. Install the CLI

pip install deepseek-ocr-cli

Quick Start

# Process a single image
deepseek-ocr document.jpg

# Process a PDF
deepseek-ocr paper.pdf

# Process all files in a directory
deepseek-ocr ./documents/ --recursive

# Preview files without processing
deepseek-ocr ./documents/ --dry-run

# Custom output directory
deepseek-ocr doc.pdf -o ./results/

# Use vLLM backend
deepseek-ocr paper.pdf --backend vllm --vllm-url http://gpu-server:8000/v1

# Parallel processing for faster PDF OCR
deepseek-ocr large-document.pdf -w 2

# Extract and analyze embedded figures
deepseek-ocr paper.pdf --analyze-figures

# Quiet mode (paths only, for scripting)
deepseek-ocr paper.pdf -q

CLI Options

deepseek-ocr [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory for results
  -r, --recursive                 Recursively process directories
  --model TEXT                    Model name (default: deepseek-ocr)
  --prompt TEXT                   Custom prompt for OCR
  --task [convert|ocr|layout|extract|parse]
                                  OCR task type
  --extract-images                Extract and save page images from PDFs
  --no-metadata                   Exclude metadata from output
  --dpi INTEGER                   PDF rendering DPI (default: 200)
  -w, --workers INTEGER           Parallel workers for PDF pages (default: 1)
  --analyze-figures               Extract and analyze embedded figures with AI
  --max-dim INTEGER               Max image dimension (default: 1920, 0 to disable)
  --backend [ollama|vllm]         Backend to use (default: ollama)
  --vllm-url TEXT                 vLLM API URL (default: http://localhost:8000/v1)
  --reprocess                     Force reprocessing of already-done files
  --dry-run                       Preview files without processing
  -q, --quiet                     Suppress output, print paths only
  --verbose                       Enable verbose output
  --help                          Show this message and exit.

Commands

process (default)

Process documents and images with OCR. The process subcommand is optional:

deepseek-ocr document.pdf
# equivalent to
deepseek-ocr process document.pdf

info

Show system and configuration information.

deepseek-ocr info

Output Format

Each document gets its own folder:

output/
└── document/
    ├── document.md          # OCR markdown
    └── figures/             # Extracted figures (if --analyze-figures)
        └── page1_fig1.png

The markdown includes metadata:

---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---

## Page 1

[Extracted content...]

Batch Resume

Batch processing saves metadata.json in the output directory. On re-run, already-processed files are skipped automatically. Use --reprocess to force reprocessing.

Configuration

Create a .env file or set environment variables with DEEPSEEK_OCR_ prefix:

DEEPSEEK_OCR_BACKEND=ollama
DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_OLLAMA_URL=http://localhost:11434
DEEPSEEK_OCR_VLLM_BASE_URL=http://localhost:8000/v1
DEEPSEEK_OCR_MAX_DIMENSION=1920
DEEPSEEK_OCR_MAX_RETRIES=3
DEEPSEEK_OCR_RETRY_DELAY=1.0

Programmatic Usage

from pathlib import Path
from deepseek_ocr import create_backend, OCRProcessor

backend = create_backend(backend_type="ollama", model_name="deepseek-ocr")
backend.load_model()

processor = OCRProcessor(
    backend=backend,
    output_dir=Path("./results"),
    workers=2,
)

result = processor.process_file(Path("document.pdf"))
print(result.output_text)

processor.save_result(result)
backend.unload_model()

Troubleshooting

Ollama not running

ollama serve

Model not found

ollama pull deepseek-ocr

Check status

deepseek-ocr info

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_ocr_cli-0.4.3.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepseek_ocr_cli-0.4.3-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file deepseek_ocr_cli-0.4.3.tar.gz.

File metadata

  • Download URL: deepseek_ocr_cli-0.4.3.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for deepseek_ocr_cli-0.4.3.tar.gz
Algorithm Hash digest
SHA256 5832698382724cca0b3102ecd0b1f09479755b56201a70389b8f47968eb74227
MD5 10406e68a0dabd685ff6c35f04a96f04
BLAKE2b-256 27f77838088422d5f5c3356297e486bce4fa74df6fc73377cdf5af153a2f0ec8

See more details on using hashes here.

File details

Details for the file deepseek_ocr_cli-0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for deepseek_ocr_cli-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e7eff5fdcef3b0653727f48239b181aff27fe127204c0800beb55c6d669274fa
MD5 7da7b37effe7c29725a3543c3b493c8f
BLAKE2b-256 84e547d5e091815a5721aaae78291f56e31bc963bf50b375eadf44e218489b5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page