Skip to main content

UCX communication module for Dask Distributed

Project description

UCX Communication Module for Distributed

This is the UCX communication backend for Dask Distributed, providing high-performance communication capabilities using the UCX (Unified Communication X) framework. It enables efficient GPU-to-GPU communication via NVLink (CUDA IPC), InfiniBand support, and various other high-speed interconnects.

Installation

This package is typically installed as part of the UCXX project build process. It can also be installed separately via conda-forge:

mamba install -c conda-forge distributed-ucxx

Or via PyPI (requires selection of CUDA version):

pip install distributed-ucxx-cu13  # For CUDA 13.x
pip install distributed-ucxx-cu12  # For CUDA 12.x

Configuration

This package provides its own configuration system that replaces the UCX configuration previously found in the main Distributed package. Configuration can be provided via:

  1. YAML configuration files: distributed-ucxx.yaml
  2. Environment variables: Using the DASK_DISTRIBUTED_UCXX_ prefix
  3. Programmatic configuration: Using Dask's configuration system

Configuration Schema

The configuration schema is defined in distributed-ucxx-schema.yaml and supports various options:

  • UCX transport configuration: tcp, nvlink, infiniband, cuda-copy, etc.
  • RMM configuration: rmm.pool-size
  • Advanced options: multi-buffer, environment

Example Configuration

New schema:

distributed-ucxx:
  tcp: true
  nvlink: true
  infiniband: false
  cuda-copy: true
  create-cuda-context: true
  multi-buffer: false
  environment:
    log-level: "info"
  rmm:
    pool-size: "1GB"

Legacy schema (may be removed in the future):

distributed:
  comm:
    ucx:
      tcp: true
      nvlink: true
      infiniband: false
      cuda-copy: true
      create-cuda-context: true
      multi-buffer: false
      environment:
        log-level: "info"
      rmm:
        pool-size: "1GB"

Environment Variables

New schema:

export DASK_DISTRIBUTED_UCXX__TCP=true
export DASK_DISTRIBUTED_UCXX__NVLINK=true
export DASK_DISTRIBUTED_UCXX__RMM__POOL_SIZE=1GB

Legacy schema (may be removed in the future):

export DASK_DISTRIBUTED__COMM__UCX__TCP=true
export DASK_DISTRIBUTED__COMM__UCX__NVLINK=true
export DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB

Python Configuration

New schema:

import dask

dask.config.set({
    "distributed-ucxx.tcp": True,
    "distributed-ucxx.nvlink": True,
    "distributed-ucxx.rmm.pool-size": "1GB"
})

Legacy schema (may be removed in the future):

import dask

dask.config.set({
    "distributed.comm.ucx.tcp": True,
    "distributed.comm.ucx.nvlink": True,
    "distributed.rmm.pool-size": "1GB"
})

Usage

The package automatically registers itself as a communication backend for Distributed using the entry point ucxx. Once installed, you can use it by specifying the protocol:

from distributed import Client

# Connect using UCXX protocol
client = Client("ucxx://scheduler-address:8786")

Or when starting a scheduler/worker:

dask scheduler --protocol ucxx
dask worker ucxx://scheduler-address:8786

Migration from Distributed

If you're migrating from the legacy UCX configuration in the main Distributed package, update your configuration keys:

  • distributed.comm.ucx.* is now distributed-ucxx.*
  • distributed.rmm.pool-size is now distributed-ucxx.rmm.pool-size

The old configuration schema is still valid for convenience, but may be removed in a future version.

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distributed_ucxx_cu12-0.48.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl (31.9 kB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64manylinux: glibc 2.28+ x86-64

File details

Details for the file distributed_ucxx_cu12-0.48.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for distributed_ucxx_cu12-0.48.0-py3-none-manylinux_2_28_aarch64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9ffa8aab8bf277ecaba10a20b2c5d3600113f0f84a58e59eb7a1bb1e82823000
MD5 03a582df368bde8cb3a50703a9c2c245
BLAKE2b-256 f36dd7c660d98f2703cbe48e2ffb5a23930df1f6c6b891f5b03cbdabed064b29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page