Skip to main content

A powerful DeepSeek-based Optical Character Recognition (OCR) implementation supporting text extraction and grounding.

Project description



Sinapsis DeepSeek OCR

DeepSeek-based Optical Character Recognition (OCR) for images

🐍 Installation🚀 Features📚 Usage example🌐 Webapp📙 Documentation🔍 License

Sinapsis DeepSeek OCR provides a powerful implementation for extracting text from images using DeepSeek's OCR model. It supports optional grounding for bounding box extraction.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:

with uv:

  uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OCR.

[!TIP] Use CLI command sinapsis info --example-template-config DeepSeekOCRInference to produce an example Agent config for the DeepSeekOCRInference template.

🚀 Features

Templates Supported

This module includes a template tailored for the DeepSeek OCR engine:

  • DeepSeekOCRInference: Uses DeepSeek's OCR model to extract text from images. Supports optional grounding for bounding box extraction.
DeepSeekOCRInference Attributes
  • prompt (str): The prompt to send to the model. Defaults to "OCR the image.".
  • enable_grounding (bool): Whether to enable grounding for bbox extraction. Defaults to False.
  • mode (str): The inference mode. Options: "tiny", "small", "gundam", "base", "large". Defaults to "base".
  • init_args (DeepSeekOCRInitArgs): Initialization arguments for the model including:
    • pretrained_model_name_or_path: Model identifier. Defaults to "deepseek-ai/DeepSeek-OCR".
    • torch_dtype: Model precision ("float16", "bfloat16", "auto"). Defaults to "auto".
    • attn_implementation: Attention implementation. Defaults to "flash_attention_2".
    • Note: This model requires CUDA. CPU inference is not supported.

📚 Usage example

Text Extraction (No Grounding)
agent:
  name: deepseek_ocr_agent
  description: Agent to run inference with DeepSeek OCR

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Perform OCR."
    enable_grounding: false
    mode: base
With Grounding (Bounding Boxes)
agent:
  name: deepseek_ocr_grounding_agent
  description: Agent with grounding for bbox extraction

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Convert the document to markdown."
    enable_grounding: true
    mode: base

- template_name: BBoxDrawer
  class_name: BBoxDrawer
  template_input: DeepSeekOCRInference
  attributes:
    draw_confidence: True
    draw_extra_labels: True

- template_name: ImageSaver
  class_name: ImageSaver
  template_input: BBoxDrawer
  attributes:
    save_dir: output
    root_dir: dataset

To run, simply use:

sinapsis run name_of_the_config.yml

🌐 Webapp

The webapp provides a simple interface to extract text from images using DeepSeek OCR. Upload your image, and the app will process it and display the detected text.

[!IMPORTANT] To run the app you first need to clone the sinapsis-ocr repository:

git clone https://github.com/Sinapsis-ai/sinapsis-ocr.git
cd sinapsis-ocr

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

[!IMPORTANT] To use DeepSeek OCR in the webapp, set the environment variable: AGENT_CONFIG_PATH=/app/packages/sinapsis_deepseek_ocr/src/sinapsis_deepseek_ocr/configs/inference.yaml

🐳 Docker

IMPORTANT This docker image depends on the sinapsis:base image. Please refer to the official sinapsis instructions to Build with Docker.

  1. Build the sinapsis-ocr image:
docker compose -f docker/compose.yaml build
  1. Start the app container:
docker compose -f docker/compose_app.yaml up
  1. Check the status:
docker logs -f sinapsis-ocr-app
  1. The logs will display the URL to access the webapp, e.g.:

NOTE: The url can be different, check the output of logs

Running on local URL:  http://127.0.0.1:7860
  1. To stop the app:
docker compose -f docker/compose_app.yaml down
💻 UV

To run the webapp using the uv package manager, please:

  1. Create the virtual environment and sync the dependencies:
uv sync --frozen
  1. Install packages:
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
  1. Run the webapp:
uv run webapps/gradio_ocr.py
  1. The terminal will display the URL to access the webapp, e.g.:
Running on local URL:  http://127.0.0.1:7860

NOTE: The url can be different, check the output of the terminal

  1. To stop the app press Control + C on the terminal

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_deepseek_ocr-0.1.2.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_deepseek_ocr-0.1.2-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_deepseek_ocr-0.1.2.tar.gz.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bcddc0b519fbca427837163b0217f2a270ba27e96cf35d020b9791acb511c7b8
MD5 435295bca0f77aff60d341763c3e8ecb
BLAKE2b-256 f9a2cc2b519e132c1f4b9da3d8a4862b74dbdc11edaef36acc5cbc841abbf0a2

See more details on using hashes here.

File details

Details for the file sinapsis_deepseek_ocr-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ac8b620a37da8e5cb828f046b9eac882fa1f61e70d9ea35b48bc506fa5ef5e85
MD5 579bab13a3c15606d0c17b65a8e10034
BLAKE2b-256 82f8bcdf042cc45e23ad883fa4f524329f0f2130e60d382d383ad4a9fca7a4b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page