Skip to main content

UCX communication module for Dask Distributed

Project description

UCX Communication Module for Distributed

This is the UCX communication backend for Dask Distributed, providing high-performance communication capabilities using the UCX (Unified Communication X) framework. It enables efficient GPU-to-GPU communication via NVLink (CUDA IPC), InfiniBand support, and various other high-speed interconnects.

Installation

This package is typically installed as part of the UCXX project build process. It can also be installed separately via conda-forge:

mamba install -c conda-forge distributed-ucxx

Or via PyPI (requires selection of CUDA version):

pip install distributed-ucxx-cu13  # For CUDA 13.x
pip install distributed-ucxx-cu12  # For CUDA 12.x

Configuration

This package provides its own configuration system that replaces the UCX configuration previously found in the main Distributed package. Configuration can be provided via:

  1. YAML configuration files: distributed-ucxx.yaml
  2. Environment variables: Using the DASK_DISTRIBUTED_UCXX_ prefix
  3. Programmatic configuration: Using Dask's configuration system

Configuration Schema

The configuration schema is defined in distributed-ucxx-schema.yaml and supports various options:

  • UCX transport configuration: tcp, nvlink, infiniband, cuda-copy, etc.
  • RMM configuration: rmm.pool-size
  • Advanced options: multi-buffer, environment

Example Configuration

New schema:

distributed-ucxx:
  tcp: true
  nvlink: true
  infiniband: false
  cuda-copy: true
  create-cuda-context: true
  multi-buffer: false
  environment:
    log-level: "info"
  rmm:
    pool-size: "1GB"

Legacy schema (may be removed in the future):

distributed:
  comm:
    ucx:
      tcp: true
      nvlink: true
      infiniband: false
      cuda-copy: true
      create-cuda-context: true
      multi-buffer: false
      environment:
        log-level: "info"
      rmm:
        pool-size: "1GB"

Environment Variables

New schema:

export DASK_DISTRIBUTED_UCXX__TCP=true
export DASK_DISTRIBUTED_UCXX__NVLINK=true
export DASK_DISTRIBUTED_UCXX__RMM__POOL_SIZE=1GB

Legacy schema (may be removed in the future):

export DASK_DISTRIBUTED__COMM__UCX__TCP=true
export DASK_DISTRIBUTED__COMM__UCX__NVLINK=true
export DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB

Python Configuration

New schema:

import dask

dask.config.set({
    "distributed-ucxx.tcp": True,
    "distributed-ucxx.nvlink": True,
    "distributed-ucxx.rmm.pool-size": "1GB"
})

Legacy schema (may be removed in the future):

import dask

dask.config.set({
    "distributed.comm.ucx.tcp": True,
    "distributed.comm.ucx.nvlink": True,
    "distributed.rmm.pool-size": "1GB"
})

Usage

The package automatically registers itself as a communication backend for Distributed using the entry point ucxx. Once installed, you can use it by specifying the protocol:

from distributed import Client

# Connect using UCXX protocol
client = Client("ucxx://scheduler-address:8786")

Or when starting a scheduler/worker:

dask scheduler --protocol ucxx
dask worker ucxx://scheduler-address:8786

Migration from Distributed

If you're migrating from the legacy UCX configuration in the main Distributed package, update your configuration keys:

  • distributed.comm.ucx.* is now distributed-ucxx.*
  • distributed.rmm.pool-size is now distributed-ucxx.rmm.pool-size

The old configuration schema is still valid for convenience, but may be removed in a future version.

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distributed_ucxx_cu13-0.49.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl (31.9 kB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64manylinux: glibc 2.28+ x86-64

File details

Details for the file distributed_ucxx_cu13-0.49.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for distributed_ucxx_cu13-0.49.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 30a161ce74f087810a3b9806a7f62dffb40b67dc503b99a070c9bac1d99dd3df
MD5 42b6ddf6ea6a0f6f75b33ef11089c09c
BLAKE2b-256 1fe20fb03000cdf9a451f113894974fc13595ebe11ec4192a36129ca72f996b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page