A docling OCR plugin for GLM-OCR
Project description
docling-glm-ocr
A docling OCR plugin that delegates text recognition to a remote GLM-OCR model served by vLLM.
Overview
docling-glm-ocr is a docling plugin that
replaces the built-in OCR stage with a call to a remote
GLM-OCR model hosted on a
vLLM server.
Each page crop is sent to the vLLM OpenAI-compatible chat completion endpoint as a base64-encoded image. The model returns Markdown-formatted text which docling merges back into the document structure.
The plugin registers itself under the "glm-ocr-remote" OCR engine key so it
can be selected per-request through docling or docling-serve without changing
application code.
Requirements
- Python 3.13+
- A running vLLM server hosting
zai-org/GLM-OCR(or any compatible model)
Installation
# with uv (recommended)
uv add docling-glm-ocr
# with pip
pip install docling-glm-ocr
Usage
Python SDK
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_glm_ocr import GlmOcrRemoteOptions
pipeline_options = PdfPipelineOptions(
allow_external_plugins=True,
ocr_options=GlmOcrRemoteOptions(
api_url="http://localhost:8001/v1/chat/completions",
model_name="zai-org/GLM-OCR",
),
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
result = converter.convert("document.pdf")
print(result.document.export_to_markdown())
docling-serve
Select the engine per-request via the standard API:
curl -X POST http://localhost:5001/v1/convert/source \
-H 'Content-Type: application/json' \
-d '{
"options": {
"ocr_engine": "glm-ocr-remote"
},
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
}'
The server must have DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS=true set so the
plugin is loaded automatically.
Configuration
Environment variables
| Variable | Description | Default |
|---|---|---|
GLMOCR_REMOTE_OCR_API_URL |
vLLM chat completion URL | http://localhost:8001/v1/chat/completions |
GLMOCR_REMOTE_OCR_PROMPT |
Text prompt sent with each image crop | see below |
GlmOcrRemoteOptions
All options can be set programmatically via GlmOcrRemoteOptions:
| Option | Type | Description | Default |
|---|---|---|---|
api_url |
str |
OpenAI-compatible chat completion URL | GLMOCR_REMOTE_OCR_API_URL env or http://localhost:8001/v1/chat/completions |
model_name |
str |
Model name sent to vLLM | zai-org/GLM-OCR |
prompt |
str |
Text prompt for each image crop | GLMOCR_REMOTE_OCR_PROMPT env or default prompt |
timeout |
float |
HTTP timeout per crop (seconds) | 120 |
max_tokens |
int |
Max tokens per completion | 16384 |
scale |
float |
Image crop rendering scale | 3.0 |
max_concurrent_requests |
int |
Max concurrent API requests | 10 |
max_retries |
int |
Max retry attempts for HTTP errors | 3 |
retry_backoff_factor |
float |
Exponential backoff factor for retries | 2.0 |
lang |
list[str] |
Language hint (passed to docling) | ["en"] |
Default prompt:
Recognize the text in the image and output in Markdown format.
Preserve the original layout (headings/paragraphs/tables/formulas).
Do not fabricate content that does not exist in the image.
Architecture
flowchart LR
subgraph docling
Pipeline --> GlmOcrRemoteModel
end
subgraph vLLM
GLMOCR["zai-org/GLM-OCR"]
end
GlmOcrRemoteModel -- "POST /v1/chat/completions\n(base64 image)" --> GLMOCR
GLMOCR -- "Markdown text" --> GlmOcrRemoteModel
For each page the model:
- Collects OCR regions from the docling layout analysis
- Renders each region using the page backend (scale configurable, default 3×)
- Encodes the crop as a base64 PNG data URI
- POSTs concurrent chat completion requests to the vLLM endpoint (with retry logic)
- Returns the recognised text as
TextCellobjects for docling to merge
Starting a GLM-OCR vLLM server
docker run -d \
--rm --name ocr-glm \
--gpus device=0 \
--ipc=host \
-p 8001:8000 \
-v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
-e "HF_TOKEN=${HF_TOKEN}" \
--entrypoint /bin/bash \
vllm/vllm-openai:latest \
-c "uv pip install --system --upgrade transformers && \
exec vllm serve zai-org/GLM-OCR \
--served-model-name zai-org/GLM-OCR \
--port 8000 \
--trust-remote-code"
The plugin will connect to http://localhost:8001/v1/chat/completions by default.
Development
Setup
git clone https://github.com/DCC-BS/docling-glm-ocr.git
cd docling-glm-ocr
make install
Available commands
make install Install dependencies and pre-commit hooks
make check Run all quality checks (ruff lint, format, ty type check)
make test Run tests with coverage report
make build Build distribution packages
make publish Publish to PyPI
Running tests
make test
Tests are in tests/ and use pytest.
Coverage reports are generated at coverage.xml and printed to the terminal.
End-to-end tests
The e2e tests hit a real vLLM server and are skipped by default.
To run them, set the server URL and use the e2e marker:
GLMOCR_REMOTE_OCR_API_URL=http://localhost:8001/v1/chat/completions pytest -m e2e
Code quality
This project uses:
- ruff – linting and formatting
- ty – type checking
- pre-commit – pre-commit hooks
Run all checks:
make check
Releasing
Releases are published to PyPI automatically.
Update the version in pyproject.toml, then trigger the Publish workflow from GitHub Actions:
GitHub → Actions → Publish to PyPI → Run workflow
The workflow tags the commit, builds the package, and publishes to PyPI via trusted publishing.
License
MIT © Data Competence Center Basel-Stadt
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docling_glm_ocr-0.3.1.tar.gz.
File metadata
- Download URL: docling_glm_ocr-0.3.1.tar.gz
- Upload date:
- Size: 151.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e61f036bd14e720fad33939493544beea85bb435f3515e8b36856ca52eb220f6
|
|
| MD5 |
70603d020f25c308449b090b561803b9
|
|
| BLAKE2b-256 |
79e96e6721e974b5de3761df6be80645d610fe114e90a27ea03183625ea4cef6
|
File details
Details for the file docling_glm_ocr-0.3.1-py3-none-any.whl.
File metadata
- Download URL: docling_glm_ocr-0.3.1-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bc52a897f09145f6e444ff54552129aa26592a281c3483b063b69336cf481e3
|
|
| MD5 |
3ed1506aaf846d6b955a477989e10fc4
|
|
| BLAKE2b-256 |
7a938c7c33b07273d3bf7a6f4be0ca506f77c508c99bb9cea3be392f8ebf7f78
|