A docling OCR plugin for GLM-OCR
Project description
docling-glm-ocr
A docling OCR plugin that delegates text recognition to a remote GLM-OCR model served by vLLM.
Overview
docling-glm-ocr is a docling plugin that
replaces the built-in OCR stage with a call to a remote
GLM-OCR model hosted on a
vLLM server.
Each page crop is sent to the vLLM OpenAI-compatible chat completion endpoint as a base64-encoded image. The model returns Markdown-formatted text which docling merges back into the document structure.
The plugin registers itself under the "glm-ocr-remote" OCR engine key so it
can be selected per-request through docling or docling-serve without changing
application code.
Requirements
- Python 3.13+
- A running vLLM server hosting
zai-org/GLM-OCR(or any compatible model)
Installation
# with uv (recommended)
uv add docling-glm-ocr
# with pip
pip install docling-glm-ocr
Usage
Python SDK
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_glm_ocr import GlmOcrRemoteOptions
pipeline_options = PdfPipelineOptions(
allow_external_plugins=True,
ocr_options=GlmOcrRemoteOptions(
api_url="http://localhost:8001/v1/chat/completions",
model_name="zai-org/GLM-OCR",
),
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
result = converter.convert("document.pdf")
print(result.document.export_to_markdown())
docling-serve
Select the engine per-request via the standard API:
curl -X POST http://localhost:5001/v1/convert/source \
-H 'Content-Type: application/json' \
-d '{
"options": {
"ocr_engine": "glm-ocr-remote"
},
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
}'
The server must have DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS=true set so the
plugin is loaded automatically.
Configuration
All options can be set via environment variables (useful for Docker / Compose
deployments) or programmatically via GlmOcrRemoteOptions. Explicit
constructor arguments always take precedence over environment variables.
Environment variables
| Variable | Description | Default |
|---|---|---|
GLMOCR_REMOTE_OCR_API_URL |
vLLM chat completion URL | http://localhost:8001/v1/chat/completions |
GLMOCR_REMOTE_OCR_MODEL_NAME |
Model name sent to vLLM | zai-org/GLM-OCR |
GLMOCR_REMOTE_OCR_PROMPT |
Text prompt sent with each image crop | see below |
GLMOCR_REMOTE_OCR_TIMEOUT |
HTTP timeout per crop (seconds) | 120 |
GLMOCR_REMOTE_OCR_MAX_TOKENS |
Max tokens per completion | 16384 |
GLMOCR_REMOTE_OCR_SCALE |
Image crop rendering scale | 3.0 |
GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS |
Pixel budget per crop | 4500000 |
GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS |
Max concurrent API requests | 10 |
GLMOCR_REMOTE_OCR_MAX_RETRIES |
Max retry attempts for HTTP errors | 3 |
GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR |
Exponential backoff factor for retries | 2.0 |
GLMOCR_REMOTE_OCR_LANG |
Comma-separated language hint(s) | en |
GLMOCR_REMOTE_OCR_API_KEY |
Bearer token for Authorization header |
unset (no header sent) |
GlmOcrRemoteOptions
All options can also be set programmatically via GlmOcrRemoteOptions:
| Option | Type | Description | Default |
|---|---|---|---|
api_url |
str |
OpenAI-compatible chat completion URL | GLMOCR_REMOTE_OCR_API_URL env or http://localhost:8001/v1/chat/completions |
model_name |
str |
Model name sent to vLLM | GLMOCR_REMOTE_OCR_MODEL_NAME env or zai-org/GLM-OCR |
prompt |
str |
Text prompt for each image crop | GLMOCR_REMOTE_OCR_PROMPT env or default prompt |
timeout |
float |
HTTP timeout per crop (seconds) | GLMOCR_REMOTE_OCR_TIMEOUT env or 120 |
max_tokens |
int |
Max tokens per completion | GLMOCR_REMOTE_OCR_MAX_TOKENS env or 16384 |
scale |
float |
Image crop rendering scale | GLMOCR_REMOTE_OCR_SCALE env or 3.0 |
max_image_pixels |
int |
Pixel budget per crop | GLMOCR_REMOTE_OCR_MAX_IMAGE_PIXELS env or 4500000 |
max_concurrent_requests |
int |
Max concurrent API requests | GLMOCR_REMOTE_OCR_MAX_CONCURRENT_REQUESTS env or 10 |
max_retries |
int |
Max retry attempts for HTTP errors | GLMOCR_REMOTE_OCR_MAX_RETRIES env or 3 |
retry_backoff_factor |
float |
Exponential backoff factor for retries | GLMOCR_REMOTE_OCR_RETRY_BACKOFF_FACTOR env or 2.0 |
lang |
list[str] |
Language hint (passed to docling) | GLMOCR_REMOTE_OCR_LANG env (comma-separated) or ["en"] |
api_key |
str | None |
Bearer token sent in Authorization header |
GLMOCR_REMOTE_OCR_API_KEY env or None (no header) |
Default prompt:
Recognize the text in the image and output in Markdown format.
Preserve the original layout (headings/paragraphs/tables/formulas).
Do not fabricate content that does not exist in the image.
Architecture
flowchart LR
subgraph docling
Pipeline --> GlmOcrRemoteModel
end
subgraph vLLM
GLMOCR["zai-org/GLM-OCR"]
end
GlmOcrRemoteModel -- "POST /v1/chat/completions\n(base64 image)" --> GLMOCR
GLMOCR -- "Markdown text" --> GlmOcrRemoteModel
For each page the model:
- Collects OCR regions from the docling layout analysis
- Renders each region using the page backend (scale configurable, default 3×)
- Encodes the crop as a base64 PNG data URI
- POSTs concurrent chat completion requests to the vLLM endpoint (with retry logic)
- Returns the recognised text as
TextCellobjects for docling to merge
Starting a GLM-OCR vLLM server
docker run -d \
--rm --name ocr-glm \
--gpus device=1 \
--ipc=host \
-p 8001:8000 \
-v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
-e "HF_TOKEN=${HF_TOKEN:-}" \
-e "LD_LIBRARY_PATH=/lib/x86_64-linux-gnu" \
vllm/vllm-openai:v0.16.0-cu130 \
zai-org/GLM-OCR \
--port 8000 \
--trust-remote-code \
--max-num-batched-tokens 8192
The plugin will connect to http://localhost:8001/v1/chat/completions by default.
Required: --max-num-batched-tokens 8192
Without this flag, vLLM will reject any high-resolution image with HTTP 400.
In vLLM 0.16.0+ (v1 engine), the encoder cache size is derived from
max_num_batched_tokens (default 2048 when chunked prefill is enabled):
encoder_cache_size = max(max_num_batched_tokens, model_max_tokens_per_image)
= max(2048, 4800) ← 4800 is GLM-OCR's model floor
= 4800 tokens ← too small for real documents
The Glm46VImageProcessor encodes images at approximately 784 pixels per token
(patch_size=14 × merge_size=2, squared). A typical A4 page rendered at scale 3×
(1785 × 2526 px) produces 5760 tokens; a phone-photo crop at scale 3× can reach
6120 tokens — both exceed the default 4800-token cache and are rejected.
Setting --max-num-batched-tokens 8192 raises the encoder cache to
max(8192, 4800) = 8192 tokens, which covers all real-world inputs with comfortable
headroom.
Note:
--limit-mm-per-promptdoes not control the encoder cache size in vLLM 0.16.0. That flag only limits the count of images per request.
Development
Setup
git clone https://github.com/DCC-BS/docling-glm-ocr.git
cd docling-glm-ocr
make install
Available commands
make install Install dependencies and pre-commit hooks
make check Run all quality checks (ruff lint, format, ty type check)
make test Run tests with coverage report
make build Build distribution packages
make publish Publish to PyPI
Running tests
make test
Tests are in tests/ and use pytest.
Coverage reports are generated at coverage.xml and printed to the terminal.
End-to-end tests
The e2e tests hit a real vLLM server and are skipped by default.
To run them, set the server URL and use the e2e marker:
GLMOCR_REMOTE_OCR_API_URL=http://localhost:8001/v1/chat/completions pytest -m e2e
Code quality
This project uses:
- ruff – linting and formatting
- ty – type checking
- pre-commit – pre-commit hooks
Run all checks:
make check
Releasing
Releases are published to PyPI automatically.
Update the version in pyproject.toml, then trigger the Publish workflow from GitHub Actions:
GitHub → Actions → Publish to PyPI → Run workflow
The workflow tags the commit, builds the package, and publishes to PyPI via trusted publishing.
License
MIT © Data Competence Center Basel-Stadt
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docling_glm_ocr-0.5.0.tar.gz.
File metadata
- Download URL: docling_glm_ocr-0.5.0.tar.gz
- Upload date:
- Size: 154.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5038bbe68b8b0b74e2470b83bf90d8793978eda550d2a0c214aa101830ea5ccc
|
|
| MD5 |
c27da5e1969408eb834636dc9920b011
|
|
| BLAKE2b-256 |
15b82f4af9367cf2c5010c5d080a323881d0f6bebb4cf19179d41940fed35603
|
File details
Details for the file docling_glm_ocr-0.5.0-py3-none-any.whl.
File metadata
- Download URL: docling_glm_ocr-0.5.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a796ff933f852d11836f7e871a9f15c0646cd429bcc5dad8e0e7de8061606881
|
|
| MD5 |
acf6f881e2b6228cacf66e1f2ad42843
|
|
| BLAKE2b-256 |
4f3146199a69c39abcb5abff6b812c2de448340f39be7d690161cd9d5f7d86c9
|