A docling OCR plugin for GLM-OCR

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DCC-BS

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only
Typing
- Typed

Project description

docling-glm-ocr

A docling OCR plugin that delegates text recognition to a remote GLM-OCR model served by vLLM.

GitHub | PyPI

Overview

docling-glm-ocr is a docling plugin that replaces the built-in OCR stage with a call to a remote GLM-OCR model hosted on a vLLM server.

Each page crop is sent to the vLLM OpenAI-compatible chat completion endpoint as a base64-encoded image. The model returns Markdown-formatted text which docling merges back into the document structure.

The plugin registers itself under the "glm-ocr-remote" OCR engine key so it can be selected per-request through docling or docling-serve without changing application code.

Requirements

Python 3.13+
A running vLLM server hosting zai-org/GLM-OCR (or any compatible model)

Installation

# with uv (recommended)
uv add docling-glm-ocr

# with pip
pip install docling-glm-ocr

Usage

Python SDK

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

from docling_glm_ocr import GlmOcrRemoteOptions

pipeline_options = PdfPipelineOptions(
    allow_external_plugins=True,
    ocr_options=GlmOcrRemoteOptions(
        api_url="http://localhost:8001/v1/chat/completions",
        model_name="zai-org/GLM-OCR",
    ),
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)
result = converter.convert("document.pdf")
print(result.document.export_to_markdown())

docling-serve

Select the engine per-request via the standard API:

curl -X POST http://localhost:5001/v1/convert/source \
  -H 'Content-Type: application/json' \
  -d '{
    "options": {
      "ocr_engine": "glm-ocr-remote"
    },
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

The server must have DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS=true set so the plugin is loaded automatically.

Configuration

All options can be set via environment variables (useful for Docker / Compose deployments) or programmatically via GlmOcrRemoteOptions. Explicit constructor arguments always take precedence over environment variables.

Environment variables

Variable	Description	Default
`GLMOCR_REMOTE_OCR_API_URL`	vLLM chat completion URL	`http://localhost:8001/v1/chat/completions`
`GLMOCR_REMOTE_OCR_MODEL_NAME`	Model name sent to vLLM	`zai-org/GLM-OCR`
`GLMOCR_REMOTE_OCR_PROMPT`	Text prompt sent with each image crop	see below
`GLMOCR_REMOTE_OCR_TIMEOUT`	HTTP timeout per crop (seconds)	`120`
`GLMOCR_REMOTE_OCR_MAX_TOKENS`	Max tokens per completion	`16384`
`GLMOCR_REMOTE_OCR_SCALE`	Image crop rendering scale	`3.0`
`GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS`	Pixel budget per crop	`4500000`
`GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS`	Max concurrent API requests	`10`
`GLMOCR_REMOTE_OCR_MAX_RETRIES`	Max retry attempts for HTTP errors	`3`
`GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR`	Exponential backoff factor for retries	`2.0`
`GLMOCR_REMOTE_OCR_LANG`	Comma-separated language hint(s)	`en`
`GLMOCR_REMOTE_OCR_API_KEY`	Bearer token for `Authorization` header	unset (no header sent)

`GlmOcrRemoteOptions`

All options can also be set programmatically via GlmOcrRemoteOptions:

Option	Type	Description	Default
`api_url`	`str`	OpenAI-compatible chat completion URL	`GLMOCR_REMOTE_OCR_API_URL` env or `http://localhost:8001/v1/chat/completions`
`model_name`	`str`	Model name sent to vLLM	`GLMOCR_REMOTE_OCR_MODEL_NAME` env or `zai-org/GLM-OCR`
`prompt`	`str`	Text prompt for each image crop	`GLMOCR_REMOTE_OCR_PROMPT` env or default prompt
`timeout`	`float`	HTTP timeout per crop (seconds)	`GLMOCR_REMOTE_OCR_TIMEOUT` env or `120`
`max_tokens`	`int`	Max tokens per completion	`GLMOCR_REMOTE_OCR_MAX_TOKENS` env or `16384`
`scale`	`float`	Image crop rendering scale	`GLMOCR_REMOTE_OCR_SCALE` env or `3.0`
`max_image_pixels`	`int`	Pixel budget per crop	`GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS` env or `4500000`
`max_concurrent_requests`	`int`	Max concurrent API requests	`GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS` env or `10`
`max_retries`	`int`	Max retry attempts for HTTP errors	`GLMOCR_REMOTE_OCR_MAX_RETRIES` env or `3`
`retry_backoff_factor`	`float`	Exponential backoff factor for retries	`GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR` env or `2.0`
`lang`	`list[str]`	Language hint (passed to docling)	`GLMOCR_REMOTE_OCR_LANG` env (comma-separated) or `["en"]`
`api_key`	`str \| None`	Bearer token sent in `Authorization` header	`GLMOCR_REMOTE_OCR_API_KEY` env or `None` (no header)

Default prompt:

Recognize the text in the image and output in Markdown format.
Preserve the original layout (headings/paragraphs/tables/formulas).
Do not fabricate content that does not exist in the image.

Architecture

flowchart LR
    subgraph docling
        Pipeline --> GlmOcrRemoteModel
    end

    subgraph vLLM
        GLMOCR["zai-org/GLM-OCR"]
    end

    GlmOcrRemoteModel -- "POST /v1/chat/completions\n(base64 image)" --> GLMOCR
    GLMOCR -- "Markdown text" --> GlmOcrRemoteModel

For each page the model:

Collects OCR regions from the docling layout analysis
Renders each region using the page backend (scale configurable, default 3×)
Encodes the crop as a base64 PNG data URI
POSTs concurrent chat completion requests to the vLLM endpoint (with retry logic)
Returns the recognised text as TextCell objects for docling to merge

Starting a GLM-OCR vLLM server

docker run -d \
  --rm --name ocr-glm \
  --gpus device=1 \
  --ipc=host \
  -p 8001:8000 \
  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
  -e "HF_TOKEN=${HF_TOKEN:-}" \
  -e "LD_LIBRARY_PATH=/lib/x86_64-linux-gnu" \
  vllm/vllm-openai:v0.16.0-cu130 \
  zai-org/GLM-OCR \
  --port 8000 \
  --trust-remote-code \
  --max-num-batched-tokens 8192

The plugin will connect to http://localhost:8001/v1/chat/completions by default.

Required: `--max-num-batched-tokens 8192`

Without this flag, vLLM will reject any high-resolution image with HTTP 400.

In vLLM 0.16.0+ (v1 engine), the encoder cache size is derived from max_num_batched_tokens (default 2048 when chunked prefill is enabled):

encoder_cache_size = max(max_num_batched_tokens, model_max_tokens_per_image)
                   = max(2048, 4800)  ←  4800 is GLM-OCR's model floor
                   = 4800 tokens      ←  too small for real documents

The Glm46VImageProcessor encodes images at approximately 784 pixels per token (patch_size=14 × merge_size=2, squared). A typical A4 page rendered at scale 3× (1785 × 2526 px) produces 5760 tokens; a phone-photo crop at scale 3× can reach 6120 tokens — both exceed the default 4800-token cache and are rejected.

Setting --max-num-batched-tokens 8192 raises the encoder cache to max(8192, 4800) = 8192 tokens, which covers all real-world inputs with comfortable headroom.

Note: --limit-mm-per-prompt does not control the encoder cache size in vLLM 0.16.0. That flag only limits the count of images per request.

Development

Setup

git clone https://github.com/DCC-BS/docling-glm-ocr.git
cd docling-glm-ocr
make install

Available commands

make install     Install dependencies and pre-commit hooks
make check       Run all quality checks (ruff lint, format, ty type check)
make test        Run tests with coverage report
make build       Build distribution packages
make publish     Publish to PyPI

Running tests

make test

Tests are in tests/ and use pytest. Coverage reports are generated at coverage.xml and printed to the terminal.

End-to-end tests

The e2e tests hit a real vLLM server and are skipped by default. To run them, set the server URL and use the e2e marker:

GLMOCR_REMOTE_OCR_API_URL=http://localhost:8001/v1/chat/completions pytest -m e2e

Code quality

This project uses:

ruff – linting and formatting
ty – type checking
pre-commit – pre-commit hooks

Run all checks:

make check

Releasing

Releases are published to PyPI automatically. Update the version in pyproject.toml, then trigger the Publish workflow from GitHub Actions:

GitHub → Actions → Publish to PyPI → Run workflow

The workflow tags the commit, builds the package, and publishes to PyPI via trusted publishing.

License

MIT © Data Competence Center Basel-Stadt

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DCC-BS

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only
Typing
- Typed

Release history Release notifications | RSS feed

This version

0.5.0

Mar 6, 2026

0.4.0

Feb 27, 2026

0.3.2

Feb 25, 2026

0.3.1

Feb 24, 2026

0.3.0

Feb 20, 2026

0.2.0

Feb 20, 2026

0.1.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_glm_ocr-0.5.0.tar.gz (154.9 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docling_glm_ocr-0.5.0-py3-none-any.whl (13.2 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file docling_glm_ocr-0.5.0.tar.gz.

File metadata

Download URL: docling_glm_ocr-0.5.0.tar.gz
Upload date: Mar 6, 2026
Size: 154.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docling_glm_ocr-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`5038bbe68b8b0b74e2470b83bf90d8793978eda550d2a0c214aa101830ea5ccc`
MD5	`c27da5e1969408eb834636dc9920b011`
BLAKE2b-256	`15b82f4af9367cf2c5010c5d080a323881d0f6bebb4cf19179d41940fed35603`

See more details on using hashes here.

File details

Details for the file docling_glm_ocr-0.5.0-py3-none-any.whl.

File metadata

Download URL: docling_glm_ocr-0.5.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docling_glm_ocr-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a796ff933f852d11836f7e871a9f15c0646cd429bcc5dad8e0e7de8061606881`
MD5	`acf6f881e2b6228cacf66e1f2ad42843`
BLAKE2b-256	`4f3146199a69c39abcb5abff6b812c2de448340f39be7d690161cd9d5f7d86c9`

See more details on using hashes here.

docling-glm-ocr 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

docling-glm-ocr

Overview

Requirements

Installation

Usage

Python SDK

docling-serve

Configuration

Environment variables

GlmOcrRemoteOptions

Architecture

Starting a GLM-OCR vLLM server

Required: --max-num-batched-tokens 8192

Development

Setup

Available commands

Running tests

End-to-end tests

Code quality

Releasing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`GlmOcrRemoteOptions`

Required: `--max-num-batched-tokens 8192`