CLI tool for OCR using DeepSeek-OCR model via Ollama
Project description
DeepSeek OCR CLI
Command-line tool for OCR using DeepSeek-OCR via Ollama. Runs locally with no API keys or cloud dependencies.
Features
- Local processing with no API keys or usage costs
- Powered by Ollama for efficient local inference
- Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
- Batch processing for multiple files and directories
- Clean markdown output with HTML tables converted to markdown
- Progress tracking for multi-page PDFs
- Terminal interface with progress bars and summary tables
Requirements
- Python 3.10+
- Ollama installed and running
deepseek-ocrmodel pulled in Ollama
Installation
1. Install Ollama
# macOS/Linux
brew install ollama
# Or download from https://ollama.ai
2. Pull the DeepSeek-OCR model
ollama pull deepseek-ocr
3. Install the CLI
pip install deepseek-ocr-cli
Alternative: Install from source
git clone https://github.com/r-uben/deepseek-ocr-cli.git
cd deepseek-ocr-cli
pip install -e .
Quick Start
# Process a single image
deepseek-ocr document.jpg
# Process a PDF
deepseek-ocr paper.pdf
# Process all files in a directory
deepseek-ocr ./documents/ --recursive
# Custom output directory
deepseek-ocr doc.pdf -o ./results/
# Custom prompt
deepseek-ocr form.jpg --prompt "Extract table data in markdown format"
# Extract page images from PDF
deepseek-ocr paper.pdf --extract-images
CLI Options
deepseek-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
-r, --recursive Recursively process directories
--model TEXT Ollama model name (default: deepseek-ocr)
--prompt TEXT Custom prompt for OCR
--task [convert|ocr|layout|extract|parse]
OCR task type
--extract-images Extract and save page images from PDFs
--no-metadata Exclude metadata from output
--verbose Enable verbose output
--help Show this message and exit.
Commands
process (default)
Process documents and images with OCR.
deepseek-ocr process document.pdf
# or simply
deepseek-ocr document.pdf
info
Show system and configuration information.
deepseek-ocr info
Output Format
The CLI generates markdown files with clean, structured output:
---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---
## Page 1
[Extracted content from page 1...]
## Page 2
[Extracted content from page 2...]
Output Processing
Automatically applied to all OCR results:
- HTML tables converted to markdown tables
- Bounding box annotations removed
- HTML entities decoded
- LaTeX math expressions preserved
Performance
Typical performance on Apple Silicon M3 Max with 200 DPI, JPEG encoding:
- Simple receipt/form: ~10 seconds
- Standard text pages: ~15-20 seconds per page
- Dense tables/charts: ~30-40 seconds per page
- Very complex pages: Up to 2 minutes (rare)
Example: 1-page receipt processed in 11 seconds (tested).
Processing time varies based on content density. The tool uses 200 DPI and JPEG encoding for optimal speed while maintaining quality. Timeout is set to 30 minutes per page for extremely dense documents.
Configuration
Create a .env file to customize settings:
DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_EXTRACT_IMAGES=false
DEEPSEEK_OCR_INCLUDE_METADATA=true
DEEPSEEK_OCR_LOG_LEVEL=INFO
OLLAMA_URL=http://localhost:11434
Programmatic Usage
from pathlib import Path
from deepseek_ocr import ModelManager, OCRProcessor
model_manager = ModelManager(model_name="deepseek-ocr")
model_manager.load_model()
processor = OCRProcessor(
model_manager=model_manager,
output_dir=Path("./results"),
)
result = processor.process_file(Path("document.pdf"))
print(result.output_text)
processor.save_result(result)
model_manager.unload_model()
Troubleshooting
Ollama not running
# Start Ollama
ollama serve
Model not found
# Pull the model
ollama pull deepseek-ocr
Check status
deepseek-ocr info
Development
poetry install
poetry run pytest
poetry run black .
poetry run ruff check .
License
MIT License - see LICENSE for details.
Built With
This tool is built on top of:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepseek_ocr_cli-0.2.2.tar.gz.
File metadata
- Download URL: deepseek_ocr_cli-0.2.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfdb9a799b16b84edb20f20d1fc8cfae7ff30162fa95ac32f1c1be53b401e9c7
|
|
| MD5 |
2c3de1fb55415fb6e166976ecb325f91
|
|
| BLAKE2b-256 |
db2c9a1063df26303b20d139f6ba2b22dc869fef7e7c64a96076e0e38df97ea1
|
File details
Details for the file deepseek_ocr_cli-0.2.2-py3-none-any.whl.
File metadata
- Download URL: deepseek_ocr_cli-0.2.2-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.11 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3df3a4352cc2ec115d21c5ae9fe35be7e3f8af55b29ead7f1a41606cfa31d0a
|
|
| MD5 |
fc8e2c3bf489629b1e3bb97804ba4ba9
|
|
| BLAKE2b-256 |
612597a35b2cd7ac3b0ae4fe755630b583535f014897fa5816bd6c8a958b52fe
|