Templates for optical character recognition using the GLM-OCR model
Project description
Sinapsis GLM OCR
GLM-OCR-based Optical Character Recognition (OCR) for images
🐍 Installation • 🚀 Features • 📚 Usage example • 📙 Documentation • 🔍 License
Sinapsis GLM OCR provides a powerful implementation for extracting text from images using Zhipu AI's GLM-OCR model. Built on the GLM-V encoder-decoder architecture, it supports document parsing (text, formula, table recognition) and structured information extraction via JSON schema prompts. It also supports batch inference for faster processing of multiple images.
🐍 Installation
Install using your package manager of choice. We encourage the use of uv
Example with uv:
uv pip install sinapsis-glm-ocr --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-glm-ocr --extra-index-url https://pypi.sinapsis.tech
[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:
with uv:
uv pip install sinapsis-glm-ocr[all] --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-glm-ocr[all] --extra-index-url https://pypi.sinapsis.tech
[!TIP] Use CLI command
sinapsis info --all-template-namesto show a list with all the available Template names installed with Sinapsis OCR.
[!TIP] Use CLI command
sinapsis info --example-template-config GLMOCRInferenceto produce an example Agent config for the GLMOCRInference template.
🚀 Features
Templates Supported
This module includes templates tailored for the GLM-OCR engine:
- GLMOCRInference: Uses GLM-OCR model to extract text from images. Supports document parsing (text, formula, table) and structured information extraction.
- GLMOCRBatchInference: Batch inference version for processing multiple images efficiently.
GLMOCRInference Attributes
prompt(str): The prompt to send to the model. Defaults to"Text Recognition:". Other options include"Formula Recognition:"and"Table Recognition:".init_args(GLMOCRInitArgs): Initialization arguments for the model including:pretrained_model_name_or_path: Model identifier. Defaults to"zai-org/GLM-OCR".torch_dtype: Model precision ("float16","bfloat16","auto"). Defaults to"auto".attn_implementation: Attention implementation ("kernels-community/flash-attn2","kernels-community/paged-attention"). Defaults to"kernels-community/flash-attn2".device_map: Device mapping ("auto","balanced","balanced_low_0","sequential", or specific device like"cuda:0"). Defaults to"auto".- Note: This model requires CUDA. CPU inference is not supported.
generation_config(GLMOCRGenerationConfig): Generation configuration including:max_new_tokens: Maximum tokens to generate. Defaults to8192.min_new_tokens: Minimum tokens to generate. Defaults to1.do_sample: Whether to use sampling. Defaults toFalse.repetition_penalty: Penalty for repeating tokens. Defaults to1.0.length_penalty: Penalty for sequence length. Defaults to1.0.
📚 Usage example
Text Recognition
agent:
name: glm_ocr_agent
description: Agent to run inference with GLM OCR for text recognition
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
- template_name: GLMOCRInference
class_name: GLMOCRInference
template_input: FolderImageDatasetCV2
attributes:
prompt: "Text Recognition:"
init_args:
pretrained_model_name_or_path: zai-org/GLM-OCR
torch_dtype: auto
attn_implementation: kernels-community/flash-attn2
device_map: auto
generation_config:
max_new_tokens: 8192
do_sample: false
Table Recognition
agent:
name: glm_ocr_table_agent
description: Agent to run inference with GLM OCR for table recognition
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
- template_name: GLMOCRInference
class_name: GLMOCRInference
template_input: FolderImageDatasetCV2
attributes:
prompt: "Table Recognition:"
init_args:
pretrained_model_name_or_path: zai-org/GLM-OCR
torch_dtype: auto
attn_implementation: kernels-community/flash-attn2
device_map: auto
generation_config:
max_new_tokens: 8192
do_sample: false
Information Extraction (JSON Schema)
agent:
name: glm_ocr_json_agent
description: Agent to extract structured information using JSON schema
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
- template_name: GLMOCRInference
class_name: GLMOCRInference
template_input: FolderImageDatasetCV2
attributes:
prompt: |
Extract the information from the provided image and output
it strictly as a valid JSON object matching the following schema.
{
"name": "string",
"date": "string",
"amount": "number"
}
init_args:
pretrained_model_name_or_path: zai-org/GLM-OCR
torch_dtype: auto
attn_implementation: kernels-community/flash-attn2
device_map: auto
generation_config:
max_new_tokens: 8192
do_sample: false
Batch Inference
agent:
name: glm_ocr_batch_agent
description: Agent to run batch inference with GLM OCR for faster processing
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: FolderImageDatasetCV2
class_name: FolderImageDatasetCV2
template_input: InputTemplate
attributes:
data_dir: dataset/input
batch_size: 4
- template_name: GLMOCRBatchInference
class_name: GLMOCRBatchInference
template_input: FolderImageDatasetCV2
attributes:
prompt: "Text Recognition:"
init_args:
pretrained_model_name_or_path: zai-org/GLM-OCR
torch_dtype: auto
attn_implementation: kernels-community/flash-attn2
device_map: auto
generation_config:
max_new_tokens: 8192
do_sample: false
To run, simply use:
sinapsis run name_of_the_config.yml
📙 Documentation
Documentation for this and other sinapsis packages is available on the sinapsis website
Tutorials for different projects within sinapsis are available at sinapsis tutorials page
🔍 License
This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.
For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinapsis_glm_ocr-0.1.0.tar.gz.
File metadata
- Download URL: sinapsis_glm_ocr-0.1.0.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abd38c1825cb9211f95301adf4a4c34411c6f05fe84b1e3303457945370d58b8
|
|
| MD5 |
3415f4116615ff8b4f4c3b26e8e916bf
|
|
| BLAKE2b-256 |
24dd7197b881bf697b1e9a66801cd0b579e6f42f45d5b347e7e1863784e66d02
|
File details
Details for the file sinapsis_glm_ocr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sinapsis_glm_ocr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73e90d8e86e5e54708878a871fd55dd6bbfc83f7d70f52c76a8c1aac71bdeea7
|
|
| MD5 |
f3a9ace2492df7d017a4e79ee6365859
|
|
| BLAKE2b-256 |
47dd8c3564639583d89cb81a4a4418ea84aec43a5fc6ba6d55344d9cda4e9510
|