Skip to main content

Python Bindings for the Unified Communication X library (UCX)

Project description

UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

Building

Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install Miniforge and then to create and activate an environment with the provided development file, for CUDA 13.x:

$ conda env create -n ucxx -f conda/environments/all_cuda-131_arch-$(uname -m).yaml

And then activate the newly created environment:

$ conda activate ucxx

Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install mamba before creating the new environment. After installing Miniforge, mamba can be installed with:

$ conda install -c conda-forge mamba

After that, one can proceed as before, but simply replacing conda with mamba in the environment creation command:

$ mamba env create -n ucxx -f conda/environments/all_cuda-131_arch-$(uname -m).yaml
$ conda activate ucxx

Convenience Script

For convenience, we provide the ./build.sh script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check ./build.sh --help.

Building C++ and Python libraries manually is also possible, see instructions on building C++ and Python.

Additionally, there is a ./build_and_run.sh script that will call ./build.sh to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with ./build_and_run.sh.

C++

To build and install C++ library to ${CONDA_PREFIX}, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install

Python

cd python
python setup.py install

Running benchmarks

C++

Currently there is one C++ benchmark with comprehensive options. It can be found under cpp/build/benchmarks/ucxx_perftest and for a full list of options -h argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the localhost (same as 127.0.0.1). Both processes specify a list of parameters, which are the message size in bytes (-s 1000000000), the number of iterations to perform (-n 10) and the progress mode (-P polling).

$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost

CUDA Memory Support

When built with UCXX_BENCHMARKS_ENABLE_CUDA=ON, the benchmark supports multiple CUDA memory types using the -m flag:

# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1

Available Memory Types:

  • host - Standard host memory allocation (default)
  • cuda - CUDA device memory allocation
  • cuda-managed - CUDA unified/managed memory allocation
  • cuda-async - CUDA device memory with asynchronous operations

Requirements for CUDA Support:

  • UCXX compiled with UCXX_BENCHMARKS_ENABLE_CUDA=ON (if building benchmarks)
  • CUDA runtime available
  • UCX configured with CUDA transport support
  • Compatible CUDA devices on both endpoints

It is recommended to use UCX_TCP_CM_REUSEADDR=y when binding to interfaces with TCP support to prevent waiting for the process' TIME_WAIT state to complete, which often takes 60 seconds after the server has terminated.

Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

Synchronous

# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking

Asynchronous

# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling

Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the UCXX_LOG_LEVEL environment variable. However, it will not enable UCX logging, one must still set UCX_LOG_LEVEL for UCX logging. A few examples are below:

# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG

Python

The UCXX Python interface uses the logging library included in Python. The only used levels currently are INFO and DEBUG, and can be enabled via the UCXPY_LOG_LEVEL environment variable. A few examples are below:

# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (527.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (496.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (524.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (494.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2d6cc9cf92ccbc95ee03528932b427f7d9c14febcfa31682ae892d471bef731e
MD5 b954ca2e945a3bde76a5b394b95571a3
BLAKE2b-256 3f0782d4eca1a2bf49c89a75350b2f8ecda2225e033fcd1d6019af70d1fe0056

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9877838cf7bd9773c36a191dba457a0c9dc2b3808eaf002852a76165dc961778
MD5 ca0bcaf272cec6979192e39f6bb352b3
BLAKE2b-256 4f5f5fe7d4d98313fe7f45e8039e924851fa9a3988d9adfd71de5f830bb19f46

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8b8cb638aa416a5a0912e72ca56d2dcf4f7b893771a62f5132d3641e8abedcce
MD5 84104b398e6e7ea28b699bc3e20ecfb4
BLAKE2b-256 4d896dd198b012814ca4536edd7e3c0814f26c4e6b38e33a9017628f5d74aa78

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a3b5a002b1496800b67326e0ba653e8470703e69a477c40faa828e098b7fe50a
MD5 cbd6c29be291446de1efa14690af28c6
BLAKE2b-256 af4bf39e269c90cb260f28a8c2c046f731330b8100b3f3c158c1e62de37c403e

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ae1ae22147d5fafd81a183ec712bace441c3682385dd83d70e05ec623e5350dc
MD5 3ef4477c835514400865669714c85310
BLAKE2b-256 91f0f9840a51a093171066567d2791f139e44b682091c7535fc0632c152fc96d

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c067838d82a1ba68589fa0ffdc1953f0d36ceadb0eb9d9ddb4496d6f1ee3543
MD5 20e281c2222b4b36dd5fe14259006890
BLAKE2b-256 aaeb374d7940e46154b88c88e77e9c2c67ff6f470a2ef2ac3cc87c8eb870a28e

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 68794ac2ac7dde1c9729fa736239202e35e6eb4540b02ab423c2fdeb4f054801
MD5 b6077c35868b3a95fbd06995527881c4
BLAKE2b-256 5cf4dcb6885cf6bc36b2bf6f89a716cdfed8a51174c8f8515b7ff589d3c66d89

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f51c5b7b45a2c86def6860527093925d801e55636c9dfcdf60b14122db092897
MD5 2e3df9e51ffc6853d6f4f153d4b2c7f4
BLAKE2b-256 4825b138aba9fdc156bd9ddbd93eb90fff90b98ab73678fd6a9d1f516900cff7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page