CLI tool for OCR using DeepSeek-OCR model via Ollama

These details have not been verified by PyPI

Project links

Project description

DeepSeek OCR CLI

Command-line tool for OCR using DeepSeek-OCR via Ollama. Runs locally with no API keys or cloud dependencies.

Features

Local processing with no API keys or usage costs
Powered by Ollama for efficient local inference
Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
Batch processing for multiple files and directories
Parallel page processing for faster PDF OCR
Clean markdown output with HTML tables converted to markdown
Progress tracking for multi-page PDFs
Terminal interface with progress bars and summary tables

Requirements

Python 3.10+
Ollama installed and running
deepseek-ocr model pulled in Ollama

Installation

1. Install Ollama

# macOS/Linux
brew install ollama

# Or download from https://ollama.ai

2. Pull the DeepSeek-OCR model

ollama pull deepseek-ocr

3. Install the CLI

pip install deepseek-ocr-cli

Alternative: Install from source

git clone https://github.com/r-uben/deepseek-ocr-cli.git
cd deepseek-ocr-cli
pip install -e .

Quick Start

# Process a single image
deepseek-ocr document.jpg

# Process a PDF
deepseek-ocr paper.pdf

# Process all files in a directory
deepseek-ocr ./documents/ --recursive

# Custom output directory
deepseek-ocr doc.pdf -o ./results/

# Custom prompt
deepseek-ocr form.jpg --prompt "Extract table data in markdown format"

# Extract page images from PDF
deepseek-ocr paper.pdf --extract-images

# Parallel processing for faster PDF OCR (2-4 workers recommended)
deepseek-ocr large-document.pdf -w 2

# Extract and analyze embedded figures with AI descriptions
deepseek-ocr paper.pdf --analyze-figures

CLI Options

deepseek-ocr [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory for results
  -r, --recursive                 Recursively process directories
  --model TEXT                    Ollama model name (default: deepseek-ocr)
  --prompt TEXT                   Custom prompt for OCR
  --task [convert|ocr|layout|extract|parse]
                                  OCR task type
  --extract-images                Extract and save page images from PDFs
  --no-metadata                   Exclude metadata from output
  --dpi INTEGER                   PDF rendering DPI (default: 200)
  -w, --workers INTEGER           Parallel workers for PDF pages (default: 1)
  --analyze-figures               Extract and analyze embedded figures with AI
  --verbose                       Enable verbose output
  --help                          Show this message and exit.

Commands

`process` (default)

Process documents and images with OCR.

deepseek-ocr process document.pdf
# or simply
deepseek-ocr document.pdf

`info`

Show system and configuration information.

deepseek-ocr info

Output Format

The CLI generates markdown files with clean, structured output:

---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---

## Page 1

[Extracted content from page 1...]

## Page 2

[Extracted content from page 2...]

Output Processing

Automatically applied to all OCR results:

HTML tables converted to markdown tables
Bounding box annotations removed
HTML entities decoded
LaTeX math expressions preserved

Performance

Typical performance on Apple Silicon M3 Max with 200 DPI, JPEG encoding:

Simple receipt/form: ~10 seconds
Standard text pages: ~15-20 seconds per page
Dense tables/charts: ~30-40 seconds per page
Very complex pages: Up to 2 minutes (rare)

Example: 1-page receipt processed in 11 seconds (tested).

Processing time varies based on content density. The tool uses 200 DPI and JPEG encoding for optimal speed while maintaining quality. Timeout is set to 30 minutes per page for extremely dense documents.

Configuration

Create a .env file to customize settings:

DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_EXTRACT_IMAGES=false
DEEPSEEK_OCR_INCLUDE_METADATA=true
DEEPSEEK_OCR_LOG_LEVEL=INFO
OLLAMA_URL=http://localhost:11434

Programmatic Usage

from pathlib import Path
from deepseek_ocr import ModelManager, OCRProcessor

model_manager = ModelManager(model_name="deepseek-ocr")
model_manager.load_model()

processor = OCRProcessor(
    model_manager=model_manager,
    output_dir=Path("./results"),
    workers=2,  # Parallel page processing (default: 1)
)

result = processor.process_file(Path("document.pdf"))
print(result.output_text)

processor.save_result(result)

model_manager.unload_model()

Troubleshooting

Ollama not running

# Start Ollama
ollama serve

Model not found

# Pull the model
ollama pull deepseek-ocr

Check status

deepseek-ocr info

Development

poetry install

poetry run pytest
poetry run black .
poetry run ruff check .

License

MIT License - see LICENSE for details.

Built With

This tool is built on top of:

DeepSeek-OCR - Vision-language model for OCR by DeepSeek AI
Ollama - Local LLM runtime for running models efficiently
PyMuPDF - PDF processing library
Pillow - Image processing library
Click - CLI framework
Rich - Terminal formatting and progress bars

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.3

Mar 15, 2026

0.4.2

Mar 11, 2026

0.4.1

Mar 11, 2026

0.4.0

Mar 11, 2026

0.3.2

Jan 15, 2026

This version

0.3.1

Dec 17, 2025

0.3.0

Dec 17, 2025

0.2.5

Dec 17, 2025

0.2.4

Dec 17, 2025

0.2.3

Dec 7, 2025

0.2.2

Dec 6, 2025

0.2.1

Dec 6, 2025

0.2.0

Dec 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_ocr_cli-0.3.1.tar.gz (16.2 kB view details)

Uploaded Dec 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepseek_ocr_cli-0.3.1-py3-none-any.whl (17.2 kB view details)

Uploaded Dec 17, 2025 Python 3

File details

Details for the file deepseek_ocr_cli-0.3.1.tar.gz.

File metadata

Download URL: deepseek_ocr_cli-0.3.1.tar.gz
Upload date: Dec 17, 2025
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0

File hashes

Hashes for deepseek_ocr_cli-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`9cacbb9e5a6e5a6442eb3b8b18ed46e97948ca7fac3aa403464c9d7c8d40fee4`
MD5	`6294384b58983863fd432a93b59a72b9`
BLAKE2b-256	`d1b32ba243ac1bb49ef11e10e6a6f06e42380c33d7c283e1cc33332d8db83356`

See more details on using hashes here.

File details

Details for the file deepseek_ocr_cli-0.3.1-py3-none-any.whl.

File metadata

Download URL: deepseek_ocr_cli-0.3.1-py3-none-any.whl
Upload date: Dec 17, 2025
Size: 17.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0

File hashes

Hashes for deepseek_ocr_cli-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f38029817b8b0fd2fa1335ebc3641605884a3d374157609fc222a9d73440510`
MD5	`315f6fcf858b0f226c831e64b3178b8e`
BLAKE2b-256	`266d3a3fe744825298a09a2b093ef2c26fc79c3822b5b134ad91f4645f05215e`

See more details on using hashes here.

deepseek-ocr-cli 0.3.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DeepSeek OCR CLI

Features

Requirements

Installation

1. Install Ollama

2. Pull the DeepSeek-OCR model

3. Install the CLI

Quick Start

CLI Options

Commands

process (default)

info

Output Format

Output Processing

Performance

Configuration

Programmatic Usage

Troubleshooting

Ollama not running

Model not found

Check status

Development

License

Built With

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`process` (default)

`info`