Skip to main content

Python Bindings for the Unified Communication X library (UCX)

Project description

UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

Building

Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install Miniforge and then to create and activate an environment with the provided development file, for CUDA 13.x:

$ conda env create -n ucxx -f conda/environments/all_cuda-131_arch-$(uname -m).yaml

And then activate the newly created environment:

$ conda activate ucxx

Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install mamba before creating the new environment. After installing Miniforge, mamba can be installed with:

$ conda install -c conda-forge mamba

After that, one can proceed as before, but simply replacing conda with mamba in the environment creation command:

$ mamba env create -n ucxx -f conda/environments/all_cuda-131_arch-$(uname -m).yaml
$ conda activate ucxx

Convenience Script

For convenience, we provide the ./build.sh script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check ./build.sh --help.

Building C++ and Python libraries manually is also possible, see instructions on building C++ and Python.

Additionally, there is a ./build_and_run.sh script that will call ./build.sh to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with ./build_and_run.sh.

C++

To build and install C++ library to ${CONDA_PREFIX}, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install

Python

cd python
python setup.py install

Running benchmarks

C++

Currently there is one C++ benchmark with comprehensive options. It can be found under cpp/build/benchmarks/ucxx_perftest and for a full list of options -h argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the localhost (same as 127.0.0.1). Both processes specify a list of parameters, which are the message size in bytes (-s 1000000000), the number of iterations to perform (-n 10) and the progress mode (-P polling).

$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost

CUDA Memory Support

When built with UCXX_BENCHMARKS_ENABLE_CUDA=ON, the benchmark supports multiple CUDA memory types using the -m flag:

# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1

Available Memory Types:

  • host - Standard host memory allocation (default)
  • cuda - CUDA device memory allocation
  • cuda-managed - CUDA unified/managed memory allocation
  • cuda-async - CUDA device memory with asynchronous operations

Requirements for CUDA Support:

  • UCXX compiled with UCXX_BENCHMARKS_ENABLE_CUDA=ON (if building benchmarks)
  • CUDA runtime available
  • UCX configured with CUDA transport support
  • Compatible CUDA devices on both endpoints

It is recommended to use UCX_TCP_CM_REUSEADDR=y when binding to interfaces with TCP support to prevent waiting for the process' TIME_WAIT state to complete, which often takes 60 seconds after the server has terminated.

Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

Synchronous

# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking

Asynchronous

# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling

Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the UCXX_LOG_LEVEL environment variable. However, it will not enable UCX logging, one must still set UCX_LOG_LEVEL for UCX logging. A few examples are below:

# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG

Python

The UCXX Python interface uses the logging library included in Python. The only used levels currently are INFO and DEBUG, and can be enabled via the UCXPY_LOG_LEVEL environment variable. A few examples are below:

# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (518.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (484.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (527.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (496.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (524.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (494.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 21741093e4815c4805ef5c11b973bf106c21bb20c62981c0d887c713ec4c942e
MD5 5c5bb19e29a73d32fb814411d11644e3
BLAKE2b-256 7a4aadab6ac4bc90a4956c123e6f2f90aabe75b157c88aff4fdfde9629d465a9

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9076dbe23d6e7e290c918ded0b99de1163f6cd0e9ce1c334ad8eeb7d25a7dee3
MD5 63c2bb92505ea395a3fa255af91e1ad8
BLAKE2b-256 62fe019e2f52e6cb3abd64be4832fbc3f2e59e3d5724ca005c67e604532d4049

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 af0e2cd551041c65400bb256018985dc2b013eafcc2ea9ce3eb51f85c97b803f
MD5 cc31f84a61fcf87e8a006836e0ac53f1
BLAKE2b-256 12b7d01baf9e2b0ff36d6b122406bfde7dde0836b1d27f1809e520b81004466f

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4e1af794aee8e7c6e9289de98a84e3df5e89e90d91336608d2c7798df59b7e4d
MD5 6c2f17d94eb188e6a4c3cc03de17dd46
BLAKE2b-256 5179a245bb86d840cdc471e18f7c62f4c6feadeb7a7d7067af8f7fe741fdde03

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a452239fd9e0deb01251539340b039ccb1bff915c1c064c93ead488ef9083837
MD5 38260004c98d2d2df7fd30391ec5700c
BLAKE2b-256 45073c204961a4e716a958755a207b656222515ce0df556826147afd54f3a755

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 891778a1b86b6b5b109feda490109ddef0247975408f7fc49cf05a373cc900c4
MD5 e8246a7d86345214626559c2391450d4
BLAKE2b-256 47743ef691d809f329c084439b558fc7427e7d25cbca84531b2078b903a0a876

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 40eca8c055247aad8c834ceec680d64e4b4bb877448f887437c91c24c612ac4e
MD5 7617909ec2a71efcaed56f9077002d80
BLAKE2b-256 cfaf16aa76993321772135dff0bfd24a1dc437565810168e19ea4b9c47eddfa6

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.48.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0e61fa45fb463ed0f4528e3fb54ae14693f7acf9710f2982ea30a8e1df44fa54
MD5 4dd2283f478d0b9bba809b1b25c818ba
BLAKE2b-256 ab7526d65e6bfcc3f4b4cc8935e0cade1011d7e8ab9545b7fc9de42d0c1a08e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page