Skip to main content

Python Bindings for the Unified Communication X library (UCX)

Project description

UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

Building

Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install Miniforge and then to create and activate an environment with the provided development file, for CUDA 13.x:

$ conda env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml

And then activate the newly created environment:

$ conda activate ucxx

Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install mamba before creating the new environment. After installing Miniforge, mamba can be installed with:

$ conda install -c conda-forge mamba

After that, one can proceed as before, but simply replacing conda with mamba in the environment creation command:

$ mamba env create -n ucxx -f conda/environments/all_cuda-130_arch-x86_64.yaml
$ conda activate ucxx

Convenience Script

For convenience, we provide the ./build.sh script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check ./build.sh --help.

Building C++ and Python libraries manually is also possible, see instructions on building C++ and Python.

Additionally, there is a ./build_and_run.sh script that will call ./build.sh to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with ./build_and_run.sh.

C++

To build and install C++ library to ${CONDA_PREFIX}, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install

Python

cd python
python setup.py install

Running benchmarks

C++

Currently there is one C++ benchmark with comprehensive options. It can be found under cpp/build/benchmarks/ucxx_perftest and for a full list of options -h argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the localhost (same as 127.0.0.1). Both processes specify a list of parameters, which are the message size in bytes (-s 1000000000), the number of iterations to perform (-n 10) and the progress mode (-P polling).

$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -n 10 -P polling localhost

CUDA Memory Support

When built with UCXX_BENCHMARKS_ENABLE_CUDA=ON, the benchmark supports multiple CUDA memory types using the -m flag:

# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -n 10 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -n 10 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -n 10 127.0.0.1

Available Memory Types:

  • host - Standard host memory allocation (default)
  • cuda - CUDA device memory allocation
  • cuda-managed - CUDA unified/managed memory allocation
  • cuda-async - CUDA device memory with asynchronous operations

Requirements for CUDA Support:

  • UCXX compiled with UCXX_BENCHMARKS_ENABLE_CUDA=ON (if building benchmarks)
  • CUDA runtime available
  • UCX configured with CUDA transport support
  • Compatible CUDA devices on both endpoints

It is recommended to use UCX_TCP_CM_REUSEADDR=y when binding to interfaces with TCP support to prevent waiting for the process' TIME_WAIT state to complete, which often takes 60 seconds after the server has terminated.

Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

Synchronous

# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking

Asynchronous

# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling

Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the UCXX_LOG_LEVEL environment variable. However, it will not enable UCX logging, one must still set UCX_LOG_LEVEL for UCX logging. A few examples are below:

# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG

Python

The UCXX Python interface uses the logging library included in Python. The only used levels currently are INFO and DEBUG, and can be enabled via the UCXPY_LOG_LEVEL environment variable. A few examples are below:

# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (494.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (465.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (494.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (465.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (477.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (501.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (474.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4666679843b66f4e6c1ee3ff0cbd262e5ecbe402212bfac072b765d7731e09c9
MD5 59c8f73b4fa86eb92af70279b563224a
BLAKE2b-256 c7b39b055f284b7b1048c63d1ea190ca90f90ff2ee67672cb6d5ef84a612b13c

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9423f1ae5b25388edcb1202360b43c9a51ff693568198cf55b61f832471defe5
MD5 6b00e8fe2620d4c35a633745ba8f0101
BLAKE2b-256 9886ca9d2182df4d4248b669b0866bad619a7fcb3d8e9ab9e1c49831f71497bf

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ab8bfdfd2a96b652eb247642cfc286d392096994270c40b71159187ffe4a7cd1
MD5 84df66ca09d55a0a42556e41dcbfaafe
BLAKE2b-256 f5f2e032429f7ce51d0125a13ccabd035f316f5f0c3f53b15c7a2fd1cc94d349

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 10f9e052e38eb15671a3e059eb3ecd75133d51cee54af27f8d9807d04c1f27ff
MD5 3547c58cb9b1fbe9c22f3dc87676aaf0
BLAKE2b-256 96771a62168ee377cdab03bffdd2fdfb5ee86b6859e513c184189745f6b443e2

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 60d0007211e92bb2a1df26565af60087a7e94232740ce4c9bc67408cb583694c
MD5 072b95a09dafc90b0580598941945201
BLAKE2b-256 533c81960bbabbd93b1961cdca9d5d1a7a8d55553084629ff7367b484b9eddd8

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 129c00cc3e9cb5713226b7680d9bb8e80534683f6d94b80da9f34c24e423ff63
MD5 e75f7afa99cba4369248d6ad9a049026
BLAKE2b-256 d291745481da7052d4dfd7e755c0f0095b046d8c666265115153a7015e667070

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c3e76e88eb0b4190fd95241919cfb548213752d40ce179babccfc60104663ff9
MD5 41b2f008a690d91e073bb1ea46d705c8
BLAKE2b-256 4b359b74286fb9ed7cf592d44874381794d8ebbadfa4fa9f04fc7166f28040ca

See more details on using hashes here.

File details

Details for the file ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu13-0.46.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9454c33e74e6e53eb61c67993930756127aee94e52ba778c532ca0467f99da69
MD5 95dfad1d0d707e86e6d1621783fedf96
BLAKE2b-256 1d87a31359f331a94b537e6ff612bb9a7e818766ed332f380abfaa345ba702f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page