Skip to main content

CLI tool for OCR using DeepSeek-OCR model via Ollama

Project description

DeepSeek OCR CLI

Python 3.10+ License: MIT

Command-line tool for OCR using DeepSeek-OCR via Ollama. Runs locally with no API keys or cloud dependencies.

Features

  • Local processing with no API keys or usage costs
  • Powered by Ollama for efficient local inference
  • Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
  • Batch processing for multiple files and directories
  • Clean markdown output with HTML tables converted to markdown
  • Progress tracking for multi-page PDFs
  • Terminal interface with progress bars and summary tables

Requirements

  • Python 3.10+
  • Ollama installed and running
  • deepseek-ocr model pulled in Ollama

Installation

1. Install Ollama

# macOS/Linux
brew install ollama

# Or download from https://ollama.ai

2. Pull the DeepSeek-OCR model

ollama pull deepseek-ocr

3. Install the CLI

pip install deepseek-ocr-cli

Alternative: Install from source

git clone https://github.com/r-uben/deepseek-ocr-cli.git
cd deepseek-ocr-cli
pip install -e .

Quick Start

# Process a single image
deepseek-ocr document.jpg

# Process a PDF
deepseek-ocr paper.pdf

# Process all files in a directory
deepseek-ocr ./documents/ --recursive

# Custom output directory
deepseek-ocr doc.pdf -o ./results/

# Custom prompt
deepseek-ocr form.jpg --prompt "Extract table data in markdown format"

# Extract page images from PDF
deepseek-ocr paper.pdf --extract-images

CLI Options

deepseek-ocr [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory for results
  -r, --recursive                 Recursively process directories
  --model TEXT                    Ollama model name (default: deepseek-ocr)
  --prompt TEXT                   Custom prompt for OCR
  --task [convert|ocr|layout|extract|parse]
                                  OCR task type
  --extract-images                Extract and save page images from PDFs
  --no-metadata                   Exclude metadata from output
  --verbose                       Enable verbose output
  --help                          Show this message and exit.

Commands

process (default)

Process documents and images with OCR.

deepseek-ocr process document.pdf
# or simply
deepseek-ocr document.pdf

info

Show system and configuration information.

deepseek-ocr info

Output Format

The CLI generates markdown files with clean, structured output:

---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---

## Page 1

[Extracted content from page 1...]

## Page 2

[Extracted content from page 2...]

Output Processing

Automatically applied to all OCR results:

  • HTML tables converted to markdown tables
  • Bounding box annotations removed
  • HTML entities decoded
  • LaTeX math expressions preserved

Performance

Typical performance on Apple Silicon M3 Pro Max:

  • Simple pages: 3-8 seconds per page
  • Dense tables/charts: 15-50 seconds per page
  • Very complex pages: Up to 7 minutes (rare)
  • Average (mixed content): ~20 seconds per page
  • 24-page PDF: ~8-20 minutes

Processing time varies significantly based on content density. Sparse pages process quickly, while dense tables or complex layouts take longer. The tool includes a 10-minute timeout per page to handle extreme cases.

Configuration

Create a .env file to customize settings:

DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_EXTRACT_IMAGES=false
DEEPSEEK_OCR_INCLUDE_METADATA=true
DEEPSEEK_OCR_LOG_LEVEL=INFO
OLLAMA_URL=http://localhost:11434

Programmatic Usage

from pathlib import Path
from deepseek_ocr import ModelManager, OCRProcessor

model_manager = ModelManager(model_name="deepseek-ocr")
model_manager.load_model()

processor = OCRProcessor(
    model_manager=model_manager,
    output_dir=Path("./results"),
)

result = processor.process_file(Path("document.pdf"))
print(result.output_text)

processor.save_result(result)

model_manager.unload_model()

Troubleshooting

Ollama not running

# Start Ollama
ollama serve

Model not found

# Pull the model
ollama pull deepseek-ocr

Check status

deepseek-ocr info

Development

poetry install

poetry run pytest
poetry run black .
poetry run ruff check .

License

MIT License - see LICENSE for details.

Built With

This tool is built on top of:

  • DeepSeek-OCR - Vision-language model for OCR by DeepSeek AI
  • Ollama - Local LLM runtime for running models efficiently
  • PyMuPDF - PDF processing library
  • Pillow - Image processing library
  • Click - CLI framework
  • Rich - Terminal formatting and progress bars

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_ocr_cli-0.2.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepseek_ocr_cli-0.2.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file deepseek_ocr_cli-0.2.0.tar.gz.

File metadata

  • Download URL: deepseek_ocr_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0

File hashes

Hashes for deepseek_ocr_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 08296ddae5065d20eef7161a2b407c2f176726b85bc466bc1f947510c6829eb6
MD5 50e3d07b66f95648601dae746a123e39
BLAKE2b-256 402cc74073dc774a42144666188acfaf7cc3618ca14e82488f68cdc2b8e9aac8

See more details on using hashes here.

File details

Details for the file deepseek_ocr_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: deepseek_ocr_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0

File hashes

Hashes for deepseek_ocr_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b4937ec26717cb416483de2ed3d384f0c82d451e54db28ae30803dc17f7fd08
MD5 d7ccab7753ba17a7a6db2f57d8eaa840
BLAKE2b-256 f5c403a78430f7971d022e267e7bdf7d92310ef9c63ca89451e0d8259af1f287

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page