Templates for optical character recognition using the GLM-OCR model

These details have not been verified by PyPI

Project links

Project description

Sinapsis GLM OCR

GLM-OCR-based Optical Character Recognition (OCR) for images

🐍 Installation • 🚀 Features • 📚 Usage example • 📙 Documentation • 🔍 License

Sinapsis GLM OCR provides a powerful implementation for extracting text from images using Zhipu AI's GLM-OCR model. Built on the GLM-V encoder-decoder architecture, it supports document parsing (text, formula, table recognition) and structured information extraction via JSON schema prompts. It also supports batch inference for faster processing of multiple images.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-glm-ocr --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-glm-ocr --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:

with uv:

  uv pip install sinapsis-glm-ocr[all] --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-glm-ocr[all] --extra-index-url https://pypi.sinapsis.tech

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OCR.

[!TIP] Use CLI command sinapsis info --example-template-config GLMOCRInference to produce an example Agent config for the GLMOCRInference template.

🚀 Features

Templates Supported

This module includes templates tailored for the GLM-OCR engine:

GLMOCRInference: Uses GLM-OCR model to extract text from images. Supports document parsing (text, formula, table) and structured information extraction.
GLMOCRBatchInference: Batch inference version for processing multiple images efficiently.

GLMOCRInference Attributes

prompt (str): The prompt to send to the model. Defaults to "Text Recognition:". Other options include "Formula Recognition:" and "Table Recognition:".
init_args (GLMOCRInitArgs): Initialization arguments for the model including:
- pretrained_model_name_or_path: Model identifier. Defaults to "zai-org/GLM-OCR".
- torch_dtype: Model precision ("float16", "bfloat16", "auto"). Defaults to "auto".
- attn_implementation: Attention implementation ("kernels-community/flash-attn2", "kernels-community/paged-attention"). Defaults to "kernels-community/flash-attn2".
- device_map: Device mapping ("auto", "balanced", "balanced_low_0", "sequential", or specific device like "cuda:0"). Defaults to "auto".
- Note: This model requires CUDA. CPU inference is not supported.
generation_config (GLMOCRGenerationConfig): Generation configuration including:
- max_new_tokens: Maximum tokens to generate. Defaults to 8192.
- min_new_tokens: Minimum tokens to generate. Defaults to 1.
- do_sample: Whether to use sampling. Defaults to False.
- repetition_penalty: Penalty for repeating tokens. Defaults to 1.0.
- length_penalty: Penalty for sequence length. Defaults to 1.0.

📚 Usage example

Text Recognition

agent:
  name: glm_ocr_agent
  description: Agent to run inference with GLM OCR for text recognition

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: GLMOCRInference
  class_name: GLMOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Text Recognition:"
    init_args:
      pretrained_model_name_or_path: zai-org/GLM-OCR
      torch_dtype: auto
      attn_implementation: kernels-community/flash-attn2
      device_map: auto
    generation_config:
      max_new_tokens: 8192
      do_sample: false

Table Recognition

agent:
  name: glm_ocr_table_agent
  description: Agent to run inference with GLM OCR for table recognition

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: GLMOCRInference
  class_name: GLMOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Table Recognition:"
    init_args:
      pretrained_model_name_or_path: zai-org/GLM-OCR
      torch_dtype: auto
      attn_implementation: kernels-community/flash-attn2
      device_map: auto
    generation_config:
      max_new_tokens: 8192
      do_sample: false

Information Extraction (JSON Schema)

agent:
  name: glm_ocr_json_agent
  description: Agent to extract structured information using JSON schema

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input

- template_name: GLMOCRInference
  class_name: GLMOCRInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: |
      Extract the information from the provided image and output
      it strictly as a valid JSON object matching the following schema.
      {
        "name": "string",
        "date": "string",
        "amount": "number"
      }
    init_args:
      pretrained_model_name_or_path: zai-org/GLM-OCR
      torch_dtype: auto
      attn_implementation: kernels-community/flash-attn2
      device_map: auto
    generation_config:
      max_new_tokens: 8192
      do_sample: false

Batch Inference

agent:
  name: glm_ocr_batch_agent
  description: Agent to run batch inference with GLM OCR for faster processing

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: FolderImageDatasetCV2
  class_name: FolderImageDatasetCV2
  template_input: InputTemplate
  attributes:
    data_dir: dataset/input
    batch_size: 4

- template_name: GLMOCRBatchInference
  class_name: GLMOCRBatchInference
  template_input: FolderImageDatasetCV2
  attributes:
    prompt: "Text Recognition:"
    init_args:
      pretrained_model_name_or_path: zai-org/GLM-OCR
      torch_dtype: auto
      attn_implementation: kernels-community/flash-attn2
      device_map: auto
    generation_config:
      max_new_tokens: 8192
      do_sample: false

To run, simply use:

sinapsis run name_of_the_config.yml

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Apr 23, 2026

This version

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_glm_ocr-0.1.0.tar.gz (22.3 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sinapsis_glm_ocr-0.1.0-py3-none-any.whl (22.7 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file sinapsis_glm_ocr-0.1.0.tar.gz.

File metadata

Download URL: sinapsis_glm_ocr-0.1.0.tar.gz
Upload date: Mar 4, 2026
Size: 22.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.17

File hashes

Hashes for sinapsis_glm_ocr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`abd38c1825cb9211f95301adf4a4c34411c6f05fe84b1e3303457945370d58b8`
MD5	`3415f4116615ff8b4f4c3b26e8e916bf`
BLAKE2b-256	`24dd7197b881bf697b1e9a66801cd0b579e6f42f45d5b347e7e1863784e66d02`

See more details on using hashes here.

File details

Details for the file sinapsis_glm_ocr-0.1.0-py3-none-any.whl.

File metadata

Download URL: sinapsis_glm_ocr-0.1.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 22.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.17

File hashes

Hashes for sinapsis_glm_ocr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73e90d8e86e5e54708878a871fd55dd6bbfc83f7d70f52c76a8c1aac71bdeea7`
MD5	`f3a9ace2492df7d017a4e79ee6365859`
BLAKE2b-256	`47dd8c3564639583d89cb81a4a4418ea84aec43a5fc6ba6d55344d9cda4e9510`

See more details on using hashes here.

sinapsis-glm-ocr 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

Sinapsis GLM OCR

GLM-OCR-based Optical Character Recognition (OCR) for images

🐍 Installation

🚀 Features

Templates Supported

📚 Usage example

📙 Documentation

🔍 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes