Skip to main content

Python Bindings for the Unified Communication X library (UCX)

Project description

UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

Building

Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install Miniforge and then to create and activate an environment with the provided development file, for CUDA 13.x:

$ conda env create -n ucxx -f conda/environments/all_cuda-130_arch-$(uname -m).yaml

And then activate the newly created environment:

$ conda activate ucxx

Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install mamba before creating the new environment. After installing Miniforge, mamba can be installed with:

$ conda install -c conda-forge mamba

After that, one can proceed as before, but simply replacing conda with mamba in the environment creation command:

$ mamba env create -n ucxx -f conda/environments/all_cuda-130_arch-$(uname -m).yaml
$ conda activate ucxx

Convenience Script

For convenience, we provide the ./build.sh script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check ./build.sh --help.

Building C++ and Python libraries manually is also possible, see instructions on building C++ and Python.

Additionally, there is a ./build_and_run.sh script that will call ./build.sh to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with ./build_and_run.sh.

C++

To build and install C++ library to ${CONDA_PREFIX}, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install

Python

cd python
python setup.py install

Running benchmarks

C++

Currently there is one C++ benchmark with comprehensive options. It can be found under cpp/build/benchmarks/ucxx_perftest and for a full list of options -h argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the localhost (same as 127.0.0.1). Both processes specify a list of parameters, which are the message size in bytes (-s 1000000000), the number of iterations to perform (-n 10) and the progress mode (-P polling).

$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost

CUDA Memory Support

When built with UCXX_BENCHMARKS_ENABLE_CUDA=ON, the benchmark supports multiple CUDA memory types using the -m flag:

# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1

Available Memory Types:

  • host - Standard host memory allocation (default)
  • cuda - CUDA device memory allocation
  • cuda-managed - CUDA unified/managed memory allocation
  • cuda-async - CUDA device memory with asynchronous operations

Requirements for CUDA Support:

  • UCXX compiled with UCXX_BENCHMARKS_ENABLE_CUDA=ON (if building benchmarks)
  • CUDA runtime available
  • UCX configured with CUDA transport support
  • Compatible CUDA devices on both endpoints

It is recommended to use UCX_TCP_CM_REUSEADDR=y when binding to interfaces with TCP support to prevent waiting for the process' TIME_WAIT state to complete, which often takes 60 seconds after the server has terminated.

Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

Synchronous

# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking

Asynchronous

# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling

Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the UCXX_LOG_LEVEL environment variable. However, it will not enable UCX logging, one must still set UCX_LOG_LEVEL for UCX logging. A few examples are below:

# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG

Python

The UCXX Python interface uses the logging library included in Python. The only used levels currently are INFO and DEBUG, and can be enabled via the UCXPY_LOG_LEVEL environment variable. A few examples are below:

# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (527.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (496.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (524.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (494.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 221dd7a2982f72f5e570dd5993674cb9e259662e54b93c4fa55d9327f440cd9f
MD5 9190dbdecba9a29d4afee885e07db7f6
BLAKE2b-256 122005a81ab541c93b56a9f2765b52865f171bc4878d04979879d7578a1f2fa6

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b54a615574fbcda5cd649670a1c9054916e0afcddc875a8fc2a58fb9f403141b
MD5 af5dffab999165012790c61ac679a5fe
BLAKE2b-256 3d017d0067303c78ae78ff301639875c73ce0fb3f13e58dbb3c9af3d8860c5eb

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 27c8c445b8686691f31821482b559de69cc349bb6191fe1c4654031a82070fef
MD5 27c6fe22b5c09376672881bcead43b2c
BLAKE2b-256 a306783a627b4b2e1c4ce10b43ecd36a85219a29b681a0e7177e7af2710c8bd5

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2dc1ba680137ebc93623a8d2b6c8c576adf65cfbd20aae8a92e7148b4bfe1354
MD5 ca38b6c955bf93afac1baee1957436c5
BLAKE2b-256 f6442a43d0bcd0c37c81f2d0faecb1faad6f0b1c1cf20b730c7f0dca76826999

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3ea8fad385d12449c15da8b3913270acdeae073c4ba10eb5c1c15b602cbdd4fd
MD5 29f4c8017fe6622f4e7ce860140077f6
BLAKE2b-256 2ba5877571fd5f7d7c23995f6342393770c1eafe9f5814f9be99a60c48eb94f1

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 101362c0a0dabbb1d1cb8d1fff460b4117c0af070dcdbcb0ddbcd5c2c175681f
MD5 e0c7704839df4de762e7de4c8db80072
BLAKE2b-256 417340f75a63057eabf76d368da7479ee27d78523bd20627115bc77ee4394348

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b2ac92d8d71ff8d37e3c0ec829074b094502c8f7fcf0e7a642648db2e125644d
MD5 29128539be2a852c80613eadd0827f32
BLAKE2b-256 18a03f02a9c294a43676ffba60eb1f4884f87219ee7da51c98fbc85015264bec

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.47.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9b4a7d48b966292c7e69f730c5ce401719be00dea5bc6cdb0f88e09f5d05cab1
MD5 f6fde101b7cfb37e6b71f84f5d51caa0
BLAKE2b-256 e4c6d14d2c02a00c5c52925cd68266d8dbcf1b7afef01eafa44f3584ded07dcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page