Running a distributed job processing documents with Docling.
Project description
Docling Jobkit
Running a distributed job processing documents with Docling.
How to use it
Local Multiprocessing CLI
The docling-jobkit-multiproc CLI enables parallel batch processing of documents using Python's multiprocessing. Each batch of documents is processed in a separate subprocess, allowing efficient parallel processing on a single machine.
Usage
# Basic usage with default settings (batch_size=10, num_processes=CPU count)
docling-jobkit-multiproc config.yaml
# Custom batch size and number of processes
docling-jobkit-multiproc config.yaml --batch-size 20 --num-processes 4
# With model artifacts
docling-jobkit-multiproc config.yaml --artifacts-path /path/to/models
# Quiet mode (suppress progress bar)
docling-jobkit-multiproc config.yaml --quiet
# Full options
docling-jobkit-multiproc config.yaml \
--batch-size 30 \
--num-processes 8 \
--artifacts-path /path/to/models \
--enable-remote-services \
--allow-external-plugins
Configuration
The configuration file format is the same as docling-jobkit-local. See example configurations:
- S3 source/target:
dev/configs/run_multiproc_s3_example.yaml - Local path source/target:
dev/configs/run_local_folder_example.yaml
Note: Only S3, Google Drive, and local_path sources support batch processing. File and HTTP sources do not support chunking.
CLI Options
--batch-size, -b: Number of documents to process in each batch (default: 10)--num-processes, -n: Number of parallel processes (default: CPU count)--artifacts-path: Path to model artifacts directory--enable-remote-services: Enable models connecting to remote services--allow-external-plugins: Enable loading modules from third-party plugins--quiet, -q: Suppress progress bar and detailed output
Local Sequential CLI
The docling-jobkit-local CLI processes documents sequentially in a single process.
docling-jobkit-local config.yaml
Using Local Path Sources and Targets
Both CLIs support local file system sources and targets. Example configuration:
sources:
- kind: local_path
path: ./input_documents/
recursive: true # optional, default true
pattern: "*.pdf" # optional glob pattern
target:
kind: local_path
path: ./output_documents/
See dev/configs/run_local_folder_example.yaml for a complete example.
Get help and support
Please feel free to connect with us using the discussion section of the main Docling repository.
Contributing
Please read Contributing to Docling Serve for details.
References
If you use Docling in your projects, please consider citing the following:
@techreport{Docling,
author = {Deep Search Team},
month = {1},
title = {Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion},
url = {https://arxiv.org/abs/2501.17887},
eprint = {2501.17887},
doi = {10.48550/arXiv.2501.17887},
version = {2.0.0},
year = {2025}
}
License
The Docling Serve codebase is under MIT license.
LF AI & Data
Docling is hosted as a project in the LF AI & Data Foundation.
IBM ❤️ Open Source AI
The project was started by the AI for Knowledge team at IBM Research Zurich.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docling_jobkit-2.0.0.tar.gz.
File metadata
- Download URL: docling_jobkit-2.0.0.tar.gz
- Upload date:
- Size: 140.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17a09c1250d2f627eb9613b370e2690b2bd0908dc8a46322918376af3245af05
|
|
| MD5 |
3d8e61db3a4fe9332e2b03ea07a39ba6
|
|
| BLAKE2b-256 |
6d805b15fe982b10aef91598168ba69566a62d42f606896d41aeb416e87214bb
|
Provenance
The following attestation bundles were made for docling_jobkit-2.0.0.tar.gz:
Publisher:
pypi.yml on docling-project/docling-jobkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docling_jobkit-2.0.0.tar.gz -
Subject digest:
17a09c1250d2f627eb9613b370e2690b2bd0908dc8a46322918376af3245af05 - Sigstore transparency entry: 2006700818
- Sigstore integration time:
-
Permalink:
docling-project/docling-jobkit@4d3ef85058c0d81456813afa3c51cb4465963319 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/docling-project
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@4d3ef85058c0d81456813afa3c51cb4465963319 -
Trigger Event:
release
-
Statement type:
File details
Details for the file docling_jobkit-2.0.0-py3-none-any.whl.
File metadata
- Download URL: docling_jobkit-2.0.0-py3-none-any.whl
- Upload date:
- Size: 173.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e5bde6d07f0094eaa6279f005bcf1442cfbf08c6e9f0be9b1da0241737a78d3
|
|
| MD5 |
432923e69f42a74d71ebffa2ad745f29
|
|
| BLAKE2b-256 |
d84356591644c0d7d198b534f7f5e299efce44b9c208ea0a1cf6b2140746f7a4
|
Provenance
The following attestation bundles were made for docling_jobkit-2.0.0-py3-none-any.whl:
Publisher:
pypi.yml on docling-project/docling-jobkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docling_jobkit-2.0.0-py3-none-any.whl -
Subject digest:
0e5bde6d07f0094eaa6279f005bcf1442cfbf08c6e9f0be9b1da0241737a78d3 - Sigstore transparency entry: 2006700879
- Sigstore integration time:
-
Permalink:
docling-project/docling-jobkit@4d3ef85058c0d81456813afa3c51cb4465963319 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/docling-project
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@4d3ef85058c0d81456813afa3c51cb4465963319 -
Trigger Event:
release
-
Statement type: