Skip to main content

Running a distributed job processing documents with Docling.

Project description

Docling Jobkit

Running a distributed job processing documents with Docling.

How to use it

Local Multiprocessing CLI

The docling-jobkit-multiproc CLI enables parallel batch processing of documents using Python's multiprocessing. Each batch of documents is processed in a separate subprocess, allowing efficient parallel processing on a single machine.

Usage

# Basic usage with default settings (batch_size=10, num_processes=CPU count)
docling-jobkit-multiproc config.yaml

# Custom batch size and number of processes
docling-jobkit-multiproc config.yaml --batch-size 20 --num-processes 4

# With model artifacts
docling-jobkit-multiproc config.yaml --artifacts-path /path/to/models

# Quiet mode (suppress progress bar)
docling-jobkit-multiproc config.yaml --quiet

# Full options
docling-jobkit-multiproc config.yaml \
  --batch-size 30 \
  --num-processes 8 \
  --artifacts-path /path/to/models \
  --enable-remote-services \
  --allow-external-plugins

Configuration

The configuration file format is the same as docling-jobkit-local. See example configurations:

  • S3 source/target: dev/configs/run_multiproc_s3_example.yaml
  • Local path source/target: dev/configs/run_local_folder_example.yaml

Note: Only S3, Google Drive, and local_path sources support batch processing. File and HTTP sources do not support chunking.

CLI Options

  • --batch-size, -b: Number of documents to process in each batch (default: 10)
  • --num-processes, -n: Number of parallel processes (default: CPU count)
  • --artifacts-path: Path to model artifacts directory
  • --enable-remote-services: Enable models connecting to remote services
  • --allow-external-plugins: Enable loading modules from third-party plugins
  • --quiet, -q: Suppress progress bar and detailed output

Local Sequential CLI

The docling-jobkit-local CLI processes documents sequentially in a single process.

docling-jobkit-local config.yaml

Using Local Path Sources and Targets

Both CLIs support local file system sources and targets. Example configuration:

sources:
  - kind: local_path
    path: ./input_documents/
    recursive: true  # optional, default true
    pattern: "*.pdf"  # optional glob pattern

target:
  kind: local_path
  path: ./output_documents/

See dev/configs/run_local_folder_example.yaml for a complete example.

Get help and support

Please feel free to connect with us using the discussion section of the main Docling repository.

Contributing

Please read Contributing to Docling Serve for details.

References

If you use Docling in your projects, please consider citing the following:

@techreport{Docling,
  author = {Deep Search Team},
  month = {1},
  title = {Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion},
  url = {https://arxiv.org/abs/2501.17887},
  eprint = {2501.17887},
  doi = {10.48550/arXiv.2501.17887},
  version = {2.0.0},
  year = {2025}
}

License

The Docling Serve codebase is under MIT license.

LF AI & Data

Docling is hosted as a project in the LF AI & Data Foundation.

IBM ❤️ Open Source AI

The project was started by the AI for Knowledge team at IBM Research Zurich.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_jobkit-2.0.0.tar.gz (140.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docling_jobkit-2.0.0-py3-none-any.whl (173.0 kB view details)

Uploaded Python 3

File details

Details for the file docling_jobkit-2.0.0.tar.gz.

File metadata

  • Download URL: docling_jobkit-2.0.0.tar.gz
  • Upload date:
  • Size: 140.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_jobkit-2.0.0.tar.gz
Algorithm Hash digest
SHA256 17a09c1250d2f627eb9613b370e2690b2bd0908dc8a46322918376af3245af05
MD5 3d8e61db3a4fe9332e2b03ea07a39ba6
BLAKE2b-256 6d805b15fe982b10aef91598168ba69566a62d42f606896d41aeb416e87214bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_jobkit-2.0.0.tar.gz:

Publisher: pypi.yml on docling-project/docling-jobkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docling_jobkit-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: docling_jobkit-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 173.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_jobkit-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e5bde6d07f0094eaa6279f005bcf1442cfbf08c6e9f0be9b1da0241737a78d3
MD5 432923e69f42a74d71ebffa2ad745f29
BLAKE2b-256 d84356591644c0d7d198b534f7f5e299efce44b9c208ea0a1cf6b2140746f7a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_jobkit-2.0.0-py3-none-any.whl:

Publisher: pypi.yml on docling-project/docling-jobkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page