A docling OCR plugin for GLM-OCR

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DCC-BS

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only
Typing
- Typed

Project description

docling-glm-ocr

A docling OCR plugin that delegates text recognition to a remote GLM-OCR model served by vLLM.

GitHub | PyPI

Overview

docling-glm-ocr is a docling plugin that replaces the built-in OCR stage with a call to a remote GLM-OCR model hosted on a vLLM server.

Each page crop is sent to the vLLM OpenAI-compatible chat completion endpoint as a base64-encoded image. The model returns Markdown-formatted text which docling merges back into the document structure.

The plugin registers itself under the "glm-ocr-remote" OCR engine key so it can be selected per-request through docling or docling-serve without changing application code.

Requirements

Python 3.13+
A running vLLM server hosting zai-org/GLM-OCR (or any compatible model)

Installation

# with uv (recommended)
uv add docling-glm-ocr

# with pip
pip install docling-glm-ocr

Usage

Python SDK

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

from docling_glm_ocr import GlmOcrRemoteOptions

pipeline_options = PdfPipelineOptions(
    allow_external_plugins=True,
    ocr_options=GlmOcrRemoteOptions(
        api_url="http://localhost:8001/v1/chat/completions",
        model_name="zai-org/GLM-OCR",
    ),
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)
result = converter.convert("document.pdf")
print(result.document.export_to_markdown())

docling-serve

Select the engine per-request via the standard API:

curl -X POST http://localhost:5001/v1/convert/source \
  -H 'Content-Type: application/json' \
  -d '{
    "options": {
      "ocr_engine": "glm-ocr-remote"
    },
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

The server must have DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS=true set so the plugin is loaded automatically.

Configuration

All options can be set via environment variables (useful for Docker / Compose deployments) or programmatically via GlmOcrRemoteOptions. Explicit constructor arguments always take precedence over environment variables.

Environment variables

Variable	Description	Default
`GLMOCR_REMOTE_OCR_API_URL`	vLLM chat completion URL	`http://localhost:8001/v1/chat/completions`
`GLMOCR_REMOTE_OCR_MODEL_NAME`	Model name sent to vLLM	`zai-org/GLM-OCR`
`GLMOCR_REMOTE_OCR_PROMPT`	Text prompt sent with each image crop	see below
`GLMOCR_REMOTE_OCR_TIMEOUT`	HTTP timeout per crop (seconds)	`120`
`GLMOCR_REMOTE_OCR_MAX_TOKENS`	Max tokens per completion	`16384`
`GLMOCR_REMOTE_OCR_SCALE`	Image crop rendering scale	`3.0`
`GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS`	Pixel budget per crop	`4500000`
`GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS`	Max concurrent API requests	`10`
`GLMOCR_REMOTE_OCR_MAX_RETRIES`	Max retry attempts for HTTP errors	`3`
`GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR`	Exponential backoff factor for retries	`2.0`
`GLMOCR_REMOTE_OCR_LANG`	Comma-separated language hint(s)	`en`

`GlmOcrRemoteOptions`

All options can also be set programmatically via GlmOcrRemoteOptions:

Option	Type	Description	Default
`api_url`	`str`	OpenAI-compatible chat completion URL	`GLMOCR_REMOTE_OCR_API_URL` env or `http://localhost:8001/v1/chat/completions`
`model_name`	`str`	Model name sent to vLLM	`GLMOCR_REMOTE_OCR_MODEL_NAME` env or `zai-org/GLM-OCR`
`prompt`	`str`	Text prompt for each image crop	`GLMOCR_REMOTE_OCR_PROMPT` env or default prompt
`timeout`	`float`	HTTP timeout per crop (seconds)	`GLMOCR_REMOTE_OCR_TIMEOUT` env or `120`
`max_tokens`	`int`	Max tokens per completion	`GLMOCR_REMOTE_OCR_MAX_TOKENS` env or `16384`
`scale`	`float`	Image crop rendering scale	`GLMOCR_REMOTE_OCR_SCALE` env or `3.0`
`max_image_pixels`	`int`	Pixel budget per crop	`GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS` env or `4500000`
`max_concurrent_requests`	`int`	Max concurrent API requests	`GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS` env or `10`
`max_retries`	`int`	Max retry attempts for HTTP errors	`GLMOCR_REMOTE_OCR_MAX_RETRIES` env or `3`
`retry_backoff_factor`	`float`	Exponential backoff factor for retries	`GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR` env or `2.0`
`lang`	`list[str]`	Language hint (passed to docling)	`GLMOCR_REMOTE_OCR_LANG` env (comma-separated) or `["en"]`

Default prompt:

Recognize the text in the image and output in Markdown format.
Preserve the original layout (headings/paragraphs/tables/formulas).
Do not fabricate content that does not exist in the image.

Architecture

flowchart LR
    subgraph docling
        Pipeline --> GlmOcrRemoteModel
    end

    subgraph vLLM
        GLMOCR["zai-org/GLM-OCR"]
    end

    GlmOcrRemoteModel -- "POST /v1/chat/completions\n(base64 image)" --> GLMOCR
    GLMOCR -- "Markdown text" --> GlmOcrRemoteModel

For each page the model:

Collects OCR regions from the docling layout analysis
Renders each region using the page backend (scale configurable, default 3×)
Encodes the crop as a base64 PNG data URI
POSTs concurrent chat completion requests to the vLLM endpoint (with retry logic)
Returns the recognised text as TextCell objects for docling to merge

Starting a GLM-OCR vLLM server

docker run -d \
  --rm --name ocr-glm \
  --gpus device=0 \
  --ipc=host \
  -p 8001:8000 \
  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
  -e "HF_TOKEN=${HF_TOKEN}" \
  --entrypoint /bin/bash \
  vllm/vllm-openai:latest \
  -c "uv pip install --system --upgrade transformers && \
      exec vllm serve zai-org/GLM-OCR \
        --served-model-name zai-org/GLM-OCR \
        --port 8000 \
        --trust-remote-code"

The plugin will connect to http://localhost:8001/v1/chat/completions by default.

Development

Setup

git clone https://github.com/DCC-BS/docling-glm-ocr.git
cd docling-glm-ocr
make install

Available commands

make install     Install dependencies and pre-commit hooks
make check       Run all quality checks (ruff lint, format, ty type check)
make test        Run tests with coverage report
make build       Build distribution packages
make publish     Publish to PyPI

Running tests

make test

Tests are in tests/ and use pytest. Coverage reports are generated at coverage.xml and printed to the terminal.

End-to-end tests

The e2e tests hit a real vLLM server and are skipped by default. To run them, set the server URL and use the e2e marker:

GLMOCR_REMOTE_OCR_API_URL=http://localhost:8001/v1/chat/completions pytest -m e2e

Code quality

This project uses:

ruff – linting and formatting
ty – type checking
pre-commit – pre-commit hooks

Run all checks:

make check

Releasing

Releases are published to PyPI automatically. Update the version in pyproject.toml, then trigger the Publish workflow from GitHub Actions:

GitHub → Actions → Publish to PyPI → Run workflow

The workflow tags the commit, builds the package, and publishes to PyPI via trusted publishing.

License

MIT © Data Competence Center Basel-Stadt

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DCC-BS

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only
Typing
- Typed

Release history Release notifications | RSS feed

0.5.0

Mar 6, 2026

0.4.0

Feb 27, 2026

This version

0.3.2

Feb 25, 2026

0.3.1

Feb 24, 2026

0.3.0

Feb 20, 2026

0.2.0

Feb 20, 2026

0.1.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_glm_ocr-0.3.2.tar.gz (152.7 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docling_glm_ocr-0.3.2-py3-none-any.whl (12.0 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file docling_glm_ocr-0.3.2.tar.gz.

File metadata

Download URL: docling_glm_ocr-0.3.2.tar.gz
Upload date: Feb 25, 2026
Size: 152.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docling_glm_ocr-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`bb7547ddf28b294c79a09ffafec3abc6111afcc31d70ce08685d7ad9e3cde761`
MD5	`5d356e8b747fd8084885e28ab8b12f3a`
BLAKE2b-256	`c99bdebaa8bff150189739c68ca4a77b0d88b494ad20b707c7d55ba190cdafaf`

See more details on using hashes here.

File details

Details for the file docling_glm_ocr-0.3.2-py3-none-any.whl.

File metadata

Download URL: docling_glm_ocr-0.3.2-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docling_glm_ocr-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d6872a2f8b33fc8165d7aa7e4af2aa1ee8d06b237a1ca980bbbaa8240385dd5`
MD5	`75dfb8487813cbc89a658869a3c17d27`
BLAKE2b-256	`827fd3f490c2690ce49abbfa11bc66c3b4f32b1f0f399fb8f1882cd5ee4927bd`

See more details on using hashes here.

docling-glm-ocr 0.3.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

docling-glm-ocr

Overview

Requirements

Installation

Usage

Python SDK

docling-serve

Configuration

Environment variables

GlmOcrRemoteOptions

Architecture

Starting a GLM-OCR vLLM server

Development

Setup

Available commands

Running tests

End-to-end tests

Code quality

Releasing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`GlmOcrRemoteOptions`