CLI tool for OCR using DeepSeek-OCR model via Ollama
Project description
DeepSeek OCR CLI
Command-line tool for OCR using DeepSeek vision models. Supports Ollama (local) and vLLM (GPU server) backends.
Features
- Multi-backend: Ollama (local, free) and vLLM (OpenAI-compatible API)
- Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
- Per-document output folders with figures
- Batch processing with incremental resume (skips already-processed files)
- Retry with exponential backoff for transient failures
- Parallel page processing for faster PDF OCR
--dry-runto preview files before processing- Clean markdown output with HTML tables converted to markdown
Requirements
- Python 3.10+
- Ollama installed and running (for Ollama backend)
deepseek-ocrmodel pulled in Ollama
Installation
1. Install Ollama
# macOS/Linux
brew install ollama
# Or download from https://ollama.ai
2. Pull the DeepSeek-OCR model
ollama pull deepseek-ocr
3. Install the CLI
pip install deepseek-ocr-cli
Quick Start
# Process a single image
deepseek-ocr document.jpg
# Process a PDF
deepseek-ocr paper.pdf
# Process all files in a directory
deepseek-ocr ./documents/ --recursive
# Preview files without processing
deepseek-ocr ./documents/ --dry-run
# Custom output directory
deepseek-ocr doc.pdf -o ./results/
# Use vLLM backend
deepseek-ocr paper.pdf --backend vllm --vllm-url http://gpu-server:8000/v1
# Parallel processing for faster PDF OCR
deepseek-ocr large-document.pdf -w 2
# Extract and analyze embedded figures
deepseek-ocr paper.pdf --analyze-figures
# Quiet mode (paths only, for scripting)
deepseek-ocr paper.pdf -q
CLI Options
deepseek-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
-r, --recursive Recursively process directories
--model TEXT Model name (default: deepseek-ocr)
--prompt TEXT Custom prompt for OCR
--task [convert|ocr|layout|extract|parse]
OCR task type
--extract-images Extract and save page images from PDFs
--no-metadata Exclude metadata from output
--dpi INTEGER PDF rendering DPI (default: 200)
-w, --workers INTEGER Parallel workers for PDF pages (default: 1)
--analyze-figures Extract and analyze embedded figures with AI
--max-dim INTEGER Max image dimension (default: 1920, 0 to disable)
--backend [ollama|vllm] Backend to use (default: ollama)
--vllm-url TEXT vLLM API URL (default: http://localhost:8000/v1)
--reprocess Force reprocessing of already-done files
--dry-run Preview files without processing
-q, --quiet Suppress output, print paths only
--verbose Enable verbose output
--help Show this message and exit.
Commands
process (default)
Process documents and images with OCR. The process subcommand is optional:
deepseek-ocr document.pdf
# equivalent to
deepseek-ocr process document.pdf
info
Show system and configuration information.
deepseek-ocr info
Output Format
Each document gets its own folder:
output/
└── document/
├── document.md # OCR markdown
└── figures/ # Extracted figures (if --analyze-figures)
└── page1_fig1.png
The markdown includes metadata:
---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---
## Page 1
[Extracted content...]
Batch Resume
Batch processing saves metadata.json in the output directory. On re-run, already-processed files are skipped automatically. Use --reprocess to force reprocessing.
Configuration
Create a .env file or set environment variables with DEEPSEEK_OCR_ prefix:
DEEPSEEK_OCR_BACKEND=ollama
DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_OLLAMA_URL=http://localhost:11434
DEEPSEEK_OCR_VLLM_BASE_URL=http://localhost:8000/v1
DEEPSEEK_OCR_MAX_DIMENSION=1920
DEEPSEEK_OCR_MAX_RETRIES=3
DEEPSEEK_OCR_RETRY_DELAY=1.0
Programmatic Usage
from pathlib import Path
from deepseek_ocr import create_backend, OCRProcessor
backend = create_backend(backend_type="ollama", model_name="deepseek-ocr")
backend.load_model()
processor = OCRProcessor(
backend=backend,
output_dir=Path("./results"),
workers=2,
)
result = processor.process_file(Path("document.pdf"))
print(result.output_text)
processor.save_result(result)
backend.unload_model()
Troubleshooting
Ollama not running
ollama serve
Model not found
ollama pull deepseek-ocr
Check status
deepseek-ocr info
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepseek_ocr_cli-0.4.2.tar.gz.
File metadata
- Download URL: deepseek_ocr_cli-0.4.2.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.14.3 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20c7adf4bea7c152c340fde3c402f697234b25113b5a7e420289ac205cea812b
|
|
| MD5 |
980edf595c5d2418d2f69940ef6202ab
|
|
| BLAKE2b-256 |
eafd5bfa242061e01e8db159ede9d3fb33af03a25c154e2c7d9a287279012780
|
File details
Details for the file deepseek_ocr_cli-0.4.2-py3-none-any.whl.
File metadata
- Download URL: deepseek_ocr_cli-0.4.2-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.14.3 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6235741cb2193625e460e495c5b804362f5be908b4ea2751680f184c300c77f
|
|
| MD5 |
760c24574ff2d7175c38d9547091584e
|
|
| BLAKE2b-256 |
a837fdc97b2b5f2f34655567b1bee64f5196570197661e26fa0b684d2061eaee
|