Skip to main content

A Python wrapper for DeepSeek-OCR model with easy-to-use API, CPU/GPU support, and Flash Attention optimization

Project description

Deep-OCR

A Python wrapper for DeepSeek-OCR model with easy-to-use API, CPU/GPU support, and Flash Attention optimization.

PyPI version Python 3.11+ License: MIT

Features

  • Easy to use: Simple Python API for OCR tasks
  • High Performance: Optimized for NVIDIA GPUs with Flash Attention 2
  • CPU/GPU Support: Works on both CPU and GPU (with CUDA patch for CPU compatibility)
  • Multiple Model Sizes: Choose from tiny, small, base, large, or gundam presets
  • Flexible Configuration: Customizable prompts, output formats, and processing options
  • Multiple Output Formats: Markdown, plain text, and structured data
  • Command Line Interface: Use from terminal or integrate into your applications
  • Batch Processing: Process multiple images with same or different prompts

Installation

Using uv (Recommended)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Basic installation
uv add deep-ocr

# With Flash Attention (Recommended for GPU users)
uv add "deep-ocr[flash-attn]"

# Development installation
uv add --dev deep-ocr

Using pip

# Basic installation
pip install deep-ocr

# With Flash Attention (Recommended for GPU users)
pip install deep-ocr[flash-attn]

# Development installation
pip install deep-ocr[dev]

Development Setup

# Clone the repository
git clone https://github.com/Gershonbest/deep-ocr.git
cd deep-ocr

# Install with uv
uv sync --dev

# Run tests
uv run pytest

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy deep_ocr/

Quick Start

Python API

from deep_ocr import DeepSeekOCR, OCRConfig

# Basic usage
ocr = DeepSeekOCR()
result = ocr.process("image.jpg", output_dir="output")

# Custom configuration
config = OCRConfig(
    model_size="large",
    device="cpu",  # or "cuda:0" for GPU
    crop_mode=True
)
ocr = DeepSeekOCR(config=config)
result = ocr.process("document.jpg", output_dir="results")

Command Line Interface

# Basic OCR
deep-ocr image.jpg

# Specify output directory
deep-ocr image.jpg -o output/

# Use large model
deep-ocr image.jpg --model-size large

# Custom prompt
deep-ocr image.jpg --prompt "Extract all text from this document"

Configuration Options

Flash Attention Optimization

For NVIDIA GPU users, Flash Attention 2 provides significant performance improvements:

from deep_ocr import DeepSeekOCR, OCRConfig

# Enable Flash Attention for high performance
config = OCRConfig(
    model_size="large",
    device="cuda:0",
    use_flash_attention=True  # Enable Flash Attention 2
)

ocr = DeepSeekOCR(config=config)
result = ocr.process("image.jpg", output_dir="output")

Requirements for Flash Attention:

  • NVIDIA GPU with CUDA support
  • flash-attn package installed
  • Sufficient GPU memory

Performance Benefits:

  • 2-4x faster inference on compatible GPUs
  • Lower memory usage
  • Better scaling with larger models

Model Size Presets

Size Base Size Image Size Description
tiny 512 512 Fastest, lowest memory usage
small 768 768 Good balance of speed/quality
base 1024 1024 Default, good quality
large 1024 1024 Higher quality, more memory
gundam 1024 640 Specialized preset

OCRConfig Parameters

config = OCRConfig(
    model_name="deepseek-ai/DeepSeek-OCR",  # Model repository
    device="cpu",                           # Device: "cpu" or "cuda:0"
    dtype=torch.float32,                   # Data type
    model_size="tiny",                     # Size preset
    base_size=512,                         # Base image size
    image_size=512,                        # Processing image size
    crop_mode=False,                       # Enable crop mode
    save_results=True,                     # Save results to files
    test_compress=False,                   # Test compression mode
    use_flash_attention=False              # Use flash attention (GPU only)
)

Usage Examples

Extract Text to Markdown

from deep_ocr import DeepSeekOCR

ocr = DeepSeekOCR()
result = ocr.ocr_to_markdown("receipt.jpg", output_dir="output")
print(result.text)

Extract Plain Text

from deep_ocr import DeepSeekOCR

ocr = DeepSeekOCR()
result = ocr.ocr_to_text("document.pdf", output_dir="output")
print(result.text)

Custom Prompt

from deep_ocr import DeepSeekOCR

ocr = DeepSeekOCR()
result = ocr.process(
    "invoice.jpg",
    prompt="<image>\n<|grounding|>Extract all items, quantities, and prices.",
    output_dir="invoices"
)

Batch Processing

Same Prompt for All Images

from deep_ocr import DeepSeekOCR

ocr = DeepSeekOCR()
images = ["doc1.jpg", "doc2.jpg", "doc3.jpg"]
results = ocr.batch_process(images, output_dir="batch_results")

for i, result in enumerate(results):
    print(f"Document {i+1}: {result.text[:100]}...")

Different Prompts for Each Image

from deep_ocr import DeepSeekOCR

ocr = DeepSeekOCR()

# Using tuples
image_prompt_pairs = [
    ("receipt.jpg", "<image>\n<|grounding|>Extract all items and prices from this receipt."),
    ("invoice.jpg", "<image>\n<|grounding|>Extract company name, invoice number, and total amount."),
    ("document.jpg", "<image>\n<|grounding|>Convert this document to markdown format.")
]
results = ocr.batch_process_with_prompts(image_prompt_pairs, output_dir="batch_results")

# Using dictionaries
image_prompt_pairs = [
    {"image": "receipt.jpg", "prompt": "Extract all items and prices."},
    {"image": "invoice.jpg", "prompt": "Extract company name and total amount."},
    {"image": "document.jpg", "prompt": "Convert to markdown format."}
]
results = ocr.batch_process_with_prompts(image_prompt_pairs, output_dir="batch_results")

# Process results
for result in results:
    if result['status'] == 'success':
        print(f"✓ {result['image']}: {result['result'].text[:100]}...")
    else:
        print(f"✗ {result['image']}: {result['error']}")

Command Line Options

deep-ocr IMAGE [OPTIONS]

Arguments:
  IMAGE                 Path to the image file to process

Options:
  -o, --output DIR      Output directory for results (default: output)
  --model-size SIZE     Model size: tiny, small, base, large, gundam (default: tiny)
  --device DEVICE       Device to use: cpu, cuda:0 (default: cpu)
  --prompt TEXT         Custom prompt for OCR
  --save-results        Save results to files (default: True)
  --no-save-results     Don't save results to files
  --test-compress       Test compression mode
  --crop-mode           Enable crop mode
  -h, --help            Show help message

Requirements

  • Python 3.11+
  • PyTorch 2.6.0
  • Transformers 4.46.3
  • Pillow (PIL)
  • Other dependencies listed in requirements.txt

CPU Compatibility

This package includes automatic CPU compatibility patches for systems without CUDA support. The model will automatically fall back to CPU processing when GPU is not available.

Output Files

The package generates several output files in the specified directory:

  • result.md - Extracted text in Markdown format
  • result.txt - Plain text output
  • result_with_boxes.jpg - Image with bounding boxes (if available)
  • result.json - Structured data (if available)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Note: This package is a wrapper/interface for the DeepSeek-OCR model. The actual model weights and architecture are developed by DeepSeek AI. This package only provides a convenient Python API for using their model.

Support

If you encounter any issues or have questions, please:

  1. Check the Issues page
  2. Create a new issue with detailed information
  3. Include your Python version, OS, and error messages

Changelog

v0.1.0

  • Initial release
  • Basic OCR functionality
  • CPU/GPU support
  • Command line interface
  • Multiple model size presets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_ocr-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deep_ocr-0.1.1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file deep_ocr-0.1.1.tar.gz.

File metadata

  • Download URL: deep_ocr-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for deep_ocr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cf2f5d229def8e78245a13af9403e986934b1ea8c1d5f302192646ed0e823693
MD5 2ff2380e2436e4612c9dd19779e4f48f
BLAKE2b-256 32f2ff42eeb3a47f209b07ba0074717df9105e4204f21715c909aad36f2e35b3

See more details on using hashes here.

File details

Details for the file deep_ocr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: deep_ocr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for deep_ocr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 701de35a859a7c19815f508c934117c6a2f68ca26d60d05ee546202b20729db0
MD5 d2adf4ba778e00cd2f954eceea8652d6
BLAKE2b-256 0611d64b5d288f2eba3cb8e7e09cbe417faa754418b59a5c658b080aac3f04a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page