Skip to main content

A powerful DeepSeek-based Optical Character Recognition (OCR) implementation supporting text extraction and grounding.

Project description



Sinapsis DeepSeek OCR

DeepSeek-based Optical Character Recognition (OCR) for images

🐍 Installation🚀 Features📚 Usage example🌐 Webapp📙 Documentation🔍 License

Sinapsis DeepSeek OCR provides a powerful implementation for extracting text from images using DeepSeek's OCR model. It supports optional grounding for bounding box extraction.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:

with uv:

  uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OCR.

[!TIP] Use CLI command sinapsis info --example-template-config DeepSeekOCRInference to produce an example Agent config for the DeepSeekOCRInference template.

🚀 Features

Templates Supported

This module includes a template tailored for the DeepSeek OCR engine:

  • DeepSeekOCRInference: Uses DeepSeek's OCR model to extract text from images. Supports optional grounding for bounding box extraction.
DeepSeekOCRInference Attributes
  • prompt (str): The prompt to send to the model. Defaults to "OCR the image.".
  • enable_grounding (bool): Whether to enable grounding for bbox extraction. Defaults to False.
  • mode (str): The inference mode. Options: "tiny", "small", "gundam", "base", "large". Defaults to "base".
  • init_args (DeepSeekOCRInitArgs): Initialization arguments for the model including:
    • pretrained_model_name_or_path: Model identifier. Defaults to "deepseek-ai/DeepSeek-OCR".
    • torch_dtype: Model precision ("float16", "bfloat16", "auto"). Defaults to "auto".
    • attn_implementation: Attention implementation. Defaults to "flash_attention_2".
    • Note: This model requires CUDA. CPU inference is not supported.

📚 Usage example

Text Extraction (No Grounding)
agent:
  name: deepseek_ocr_agent
  description: Agent to run inference with DeepSeek OCR

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Perform OCR."
    enable_grounding: false
    mode: base
With Grounding (Bounding Boxes)
agent:
  name: deepseek_ocr_grounding_agent
  description: Agent with grounding for bbox extraction

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Convert the document to markdown."
    enable_grounding: true
    mode: base

- template_name: BBoxDrawer
  class_name: BBoxDrawer
  template_input: DeepSeekOCRInference
  attributes:
    draw_confidence: True
    draw_extra_labels: True

- template_name: ImageSaver
  class_name: ImageSaver
  template_input: BBoxDrawer
  attributes:
    save_dir: output
    root_dir: dataset

To run, simply use:

sinapsis run name_of_the_config.yml

🌐 Webapp

The webapp provides a simple interface to extract text from images using DeepSeek OCR. Upload your image, and the app will process it and display the detected text.

[!IMPORTANT] To run the app you first need to clone the sinapsis-ocr repository:

git clone https://github.com/Sinapsis-ai/sinapsis-ocr.git
cd sinapsis-ocr

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

[!IMPORTANT] To use DeepSeek OCR in the webapp, set the environment variable: AGENT_CONFIG_PATH=/app/packages/sinapsis_deepseek_ocr/src/sinapsis_deepseek_ocr/configs/inference.yaml

🐳 Docker

IMPORTANT This docker image depends on the sinapsis:base image. Please refer to the official sinapsis instructions to Build with Docker.

  1. Build the sinapsis-ocr image:
docker compose -f docker/compose.yaml build
  1. Start the app container:
docker compose -f docker/compose_app.yaml up
  1. Check the status:
docker logs -f sinapsis-ocr-app
  1. The logs will display the URL to access the webapp, e.g.:

NOTE: The url can be different, check the output of logs

Running on local URL:  http://127.0.0.1:7860
  1. To stop the app:
docker compose -f docker/compose_app.yaml down
💻 UV

To run the webapp using the uv package manager, please:

  1. Create the virtual environment and sync the dependencies:
uv sync --frozen
  1. Install packages:
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
  1. Run the webapp:
uv run webapps/gradio_ocr.py
  1. The terminal will display the URL to access the webapp, e.g.:
Running on local URL:  http://127.0.0.1:7860

NOTE: The url can be different, check the output of the terminal

  1. To stop the app press Control + C on the terminal

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_deepseek_ocr-0.1.0.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_deepseek_ocr-0.1.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_deepseek_ocr-0.1.0.tar.gz.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6e88a136275dd125d8d9b96103d7cc1b73ab3e81044cef72f18b8ed1cd9ed78a
MD5 09e0c37bdb5ef8aeeba0b1a0575744ee
BLAKE2b-256 e671297e2aece18c587497376638ef3a2ad3588ed32c293e8c0b140312ce0c39

See more details on using hashes here.

File details

Details for the file sinapsis_deepseek_ocr-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa6c3050405b7d83cbcec0016762c97400583644b036d160894b87afe5f890ec
MD5 24898085e4497bff4997d34fb2bec561
BLAKE2b-256 e191b7542a5b7c0e9d8d7beffee72452e0b98f95a0ae6258aca0767df84c6abc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page