A powerful DeepSeek-based Optical Character Recognition (OCR) implementation supporting text extraction and grounding.
Project description
Sinapsis DeepSeek OCR
DeepSeek-based Optical Character Recognition (OCR) for images
🐍 Installation • 🚀 Features • 📚 Usage example • 🌐 Webapp • 📙 Documentation • 🔍 License
Sinapsis DeepSeek OCR provides a powerful implementation for extracting text from images using DeepSeek's OCR model. It supports optional grounding for bounding box extraction.
🐍 Installation
Install using your package manager of choice. We encourage the use of uv
Example with uv:
uv pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-deepseek-ocr --extra-index-url https://pypi.sinapsis.tech
[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:
with uv:
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
[!TIP] Use CLI command
sinapsis info --all-template-namesto show a list with all the available Template names installed with Sinapsis OCR.
[!TIP] Use CLI command
sinapsis info --example-template-config DeepSeekOCRInferenceto produce an example Agent config for the DeepSeekOCRInference template.
🚀 Features
Templates Supported
This module includes a template tailored for the DeepSeek OCR engine:
- DeepSeekOCRInference: Uses DeepSeek's OCR model to extract text from images. Supports optional grounding for bounding box extraction.
DeepSeekOCRInference Attributes
prompt(str): The prompt to send to the model. Defaults to"OCR the image.".enable_grounding(bool): Whether to enable grounding for bbox extraction. Defaults toFalse.mode(str): The inference mode. Options:"tiny","small","gundam","base","large". Defaults to"base".init_args(DeepSeekOCRInitArgs): Initialization arguments for the model including:pretrained_model_name_or_path: Model identifier. Defaults to"deepseek-ai/DeepSeek-OCR".torch_dtype: Model precision ("float16","bfloat16","auto"). Defaults to"auto".attn_implementation: Attention implementation. Defaults to"flash_attention_2".- Note: This model requires CUDA. CPU inference is not supported.
📚 Usage example
Text Extraction (No Grounding)
agent:
name: deepseek_ocr_agent
description: Agent to run inference with DeepSeek OCR
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
- template_name: DeepSeekOCRInference
class_name: DeepSeekOCRInference
template_input: FolderImageDatasetCV2
attributes:
prompt: "Perform OCR."
enable_grounding: false
mode: base
With Grounding (Bounding Boxes)
agent:
name: deepseek_ocr_grounding_agent
description: Agent with grounding for bbox extraction
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
- template_name: DeepSeekOCRInference
class_name: DeepSeekOCRInference
template_input: FolderImageDatasetCV2
attributes:
prompt: "Convert the document to markdown."
enable_grounding: true
mode: base
- template_name: BBoxDrawer
class_name: BBoxDrawer
template_input: DeepSeekOCRInference
attributes:
draw_confidence: True
draw_extra_labels: True
- template_name: ImageSaver
class_name: ImageSaver
template_input: BBoxDrawer
attributes:
save_dir: output
root_dir: dataset
To run, simply use:
sinapsis run name_of_the_config.yml
🌐 Webapp
The webapp provides a simple interface to extract text from images using DeepSeek OCR. Upload your image, and the app will process it and display the detected text.
[!IMPORTANT] To run the app you first need to clone the sinapsis-ocr repository:
git clone https://github.com/Sinapsis-ai/sinapsis-ocr.git
cd sinapsis-ocr
[!NOTE] If you'd like to enable external app sharing in Gradio,
export GRADIO_SHARE_APP=True
[!IMPORTANT] To use DeepSeek OCR in the webapp, set the environment variable:
AGENT_CONFIG_PATH=/app/packages/sinapsis_deepseek_ocr/src/sinapsis_deepseek_ocr/configs/inference.yaml
🐳 Docker
IMPORTANT This docker image depends on the sinapsis:base image. Please refer to the official sinapsis instructions to Build with Docker.
- Build the sinapsis-ocr image:
docker compose -f docker/compose.yaml build
- Start the app container:
docker compose -f docker/compose_app.yaml up
- Check the status:
docker logs -f sinapsis-ocr-app
- The logs will display the URL to access the webapp, e.g.:
NOTE: The url can be different, check the output of logs
Running on local URL: http://127.0.0.1:7860
- To stop the app:
docker compose -f docker/compose_app.yaml down
💻 UV
To run the webapp using the uv package manager, please:
- Create the virtual environment and sync the dependencies:
uv sync --frozen
- Install packages:
uv pip install sinapsis-deepseek-ocr[all] --extra-index-url https://pypi.sinapsis.tech
- Run the webapp:
uv run webapps/gradio_ocr.py
- The terminal will display the URL to access the webapp, e.g.:
Running on local URL: http://127.0.0.1:7860
NOTE: The url can be different, check the output of the terminal
- To stop the app press
Control + Con the terminal
📙 Documentation
Documentation for this and other sinapsis packages is available on the sinapsis website
Tutorials for different projects within sinapsis are available at sinapsis tutorials page
🔍 License
This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.
For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinapsis_deepseek_ocr-0.1.0.tar.gz.
File metadata
- Download URL: sinapsis_deepseek_ocr-0.1.0.tar.gz
- Upload date:
- Size: 23.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e88a136275dd125d8d9b96103d7cc1b73ab3e81044cef72f18b8ed1cd9ed78a
|
|
| MD5 |
09e0c37bdb5ef8aeeba0b1a0575744ee
|
|
| BLAKE2b-256 |
e671297e2aece18c587497376638ef3a2ad3588ed32c293e8c0b140312ce0c39
|
File details
Details for the file sinapsis_deepseek_ocr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sinapsis_deepseek_ocr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa6c3050405b7d83cbcec0016762c97400583644b036d160894b87afe5f890ec
|
|
| MD5 |
24898085e4497bff4997d34fb2bec561
|
|
| BLAKE2b-256 |
e191b7542a5b7c0e9d8d7beffee72452e0b98f95a0ae6258aca0767df84c6abc
|