Skip to main content

A powerful DeepSeek-based Optical Character Recognition (OCR) implementation supporting text extraction and grounding.

Project description



Sinapsis DeepSeek OCR

DeepSeek-based Optical Character Recognition (OCR) for images

🐍 Installation🚀 Features📚 Usage example🌐 Webapp📙 Documentation🔍 License

Sinapsis DeepSeek OCR provides a powerful implementation for extracting text from images using DeepSeek's OCR model. It supports optional grounding for bounding box extraction.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:

with uv:

  uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OCR.

[!TIP] Use CLI command sinapsis info --example-template-config DeepSeekOCRInference to produce an example Agent config for the DeepSeekOCRInference template.

🚀 Features

Templates Supported

This module includes a template tailored for the DeepSeek OCR engine:

  • DeepSeekOCRInference: Uses DeepSeek's OCR model to extract text from images. Supports optional grounding for bounding box extraction.
DeepSeekOCRInference Attributes
  • prompt (str): The prompt to send to the model. Defaults to "OCR the image.".
  • enable_grounding (bool): Whether to enable grounding for bbox extraction. Defaults to False.
  • mode (str): The inference mode. Options: "tiny", "small", "gundam", "base", "large". Defaults to "base".
  • init_args (DeepSeekOCRInitArgs): Initialization arguments for the model including:
    • pretrained_model_name_or_path: Model identifier. Defaults to "deepseek-ai/DeepSeek-OCR".
    • torch_dtype: Model precision ("float16", "bfloat16", "auto"). Defaults to "auto".
    • attn_implementation: Attention implementation. Defaults to "flash_attention_2".
    • Note: This model requires CUDA. CPU inference is not supported.

📚 Usage example

Text Extraction (No Grounding)
agent:
  name: deepseek_ocr_agent
  description: Agent to run inference with DeepSeek OCR

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Perform OCR."
    enable_grounding: false
    mode: base
With Grounding (Bounding Boxes)
agent:
  name: deepseek_ocr_grounding_agent
  description: Agent with grounding for bbox extraction

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: DeepSeekOCRInference
  class_name: DeepSeekOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Convert the document to markdown."
    enable_grounding: true
    mode: base

- template_name: BBoxDrawer
  class_name: BBoxDrawer
  template_input: DeepSeekOCRInference
  attributes:
    draw_confidence: True
    draw_extra_labels: True

- template_name: ImageSaver
  class_name: ImageSaver
  template_input: BBoxDrawer
  attributes:
    save_dir: output
    root_dir: dataset

To run, simply use:

sinapsis run name_of_the_config.yml

🌐 Webapp

The webapp provides a simple interface to extract text from images using DeepSeek OCR. Upload your image, and the app will process it and display the detected text.

[!IMPORTANT] To run the app you first need to clone the sinapsis-ocr repository:

git clone https://github.com/Sinapsis-ai/sinapsis-ocr.git
cd sinapsis-ocr

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

[!IMPORTANT] To use DeepSeek OCR in the webapp, set the environment variable: AGENT_CONFIG_PATH=/app/packages/sinapsis_deepseek_ocr/src/sinapsis_deepseek_ocr/configs/inference.yaml

🐳 Docker

IMPORTANT This docker image depends on the sinapsis:base image. Please refer to the official sinapsis instructions to Build with Docker.

  1. Build the sinapsis-ocr image:
docker compose -f docker/compose.yaml build
  1. Start the app container:
docker compose -f docker/compose_app.yaml up
  1. Check the status:
docker logs -f sinapsis-ocr-app
  1. The logs will display the URL to access the webapp, e.g.:

NOTE: The url can be different, check the output of logs

Running on local URL:  http://127.0.0.1:7860
  1. To stop the app:
docker compose -f docker/compose_app.yaml down
💻 UV

To run the webapp using the uv package manager, please:

  1. Create the virtual environment and sync the dependencies:
uv sync --frozen
  1. Install packages:
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
  1. Run the webapp:
uv run webapps/gradio_ocr.py
  1. The terminal will display the URL to access the webapp, e.g.:
Running on local URL:  http://127.0.0.1:7860

NOTE: The url can be different, check the output of the terminal

  1. To stop the app press Control + C on the terminal

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_deepseek_ocr-0.1.1.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_deepseek_ocr-0.1.1-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_deepseek_ocr-0.1.1.tar.gz.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bdaef754ad62f8e69f25d09c1996e317d0acf0cb9fe37351fd0b3fae080f914d
MD5 1ae0cc5c73d62efe10c7d5e67e8359d9
BLAKE2b-256 d6517faffc8625e379534a190b792c2254402a89005e0e136c02b3c46f945ce5

See more details on using hashes here.

File details

Details for the file sinapsis_deepseek_ocr-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_deepseek_ocr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cee0c6aaf0f6bf79269e86c633dee1a758dc7bf534090c960ae9a62e5974e20f
MD5 9d777a16db9d9a67072502f4300ec7bd
BLAKE2b-256 d11bf6fa409d110dc27c14c5fdb9b4dc25ba3f81f18221d441136009c7c0c0c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page