Skip to main content

Python Bindings for the Unified Communication X library (UCX)

Project description

UCXX

UCXX is an object-oriented C++ interface for UCX, with native support for Python bindings.

Building

Environment setup

Before starting it is necessary to have the necessary dependencies installed. The simplest way to get started is to install Miniforge and then to create and activate an environment with the provided development file, for CUDA 12.x:

$ conda env create -n ucxx -f conda/environments/all_cuda-128_arch-x86_64.yaml

And then activate the newly created environment:

$ conda activate ucxx

Faster conda dependency resolution

The procedure aforementioned should complete without issues, but it may be slower than necessary. One alternative to speed up dependency resolution is to install mamba before creating the new environment. After installing Miniforge, mamba can be installed with:

$ conda install -c conda-forge mamba

After that, one can proceed as before, but simply replacing conda with mamba in the environment creation command:

$ mamba env create -n ucxx -f conda/environments/all_cuda-128_arch-x86_64.yaml
$ conda activate ucxx

Convenience Script

For convenience, we provide the ./build.sh script. By default, it will build and install both C++ and Python libraries. For a detailed description on available options please check ./build.sh --help.

Building C++ and Python libraries manually is also possible, see instructions on building C++ and Python.

Additionally, there is a ./build_and_run.sh script that will call ./build.sh to build everything as well as running C++ and Python tests and a few benchmarks. Similarly, details on existing options can be queried with ./build_and_run.sh.

C++

To build and install C++ library to ${CONDA_PREFIX}, with both Python and RMM support, as well as building all tests and benchmarks (with CUDA support) run:

mkdir cpp/build
cd cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
      -DBUILD_TESTS=ON \
      -DBUILD_BENCHMARKS=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DUCXX_ENABLE_PYTHON=ON \
      -DUCXX_ENABLE_RMM=ON \
      -DUCXX_BENCHMARKS_ENABLE_CUDA=ON
make -j install

Python

cd python
python setup.py install

Running benchmarks

C++

Currently there is one C++ benchmark with comprehensive options. It can be found under cpp/build/benchmarks/ucxx_perftest and for a full list of options --help argument can be used.

The benchmark is composed of two processes: a server and a client. The server must not specify an IP address or hostname and will bind to all available interfaces, whereas the client must specify the IP address or hostname where the server can be reached.

Basic Usage

Below is an example of running a server first, followed by the client connecting to the server on the localhost (same as 127.0.0.1). Both processes specify a list of parameters, which are the message size in bytes (-s 1000000000), that allocations should be reused (-r), the number of iterations to perform (-n 10) and the progress mode (-P polling).

$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -s 1000000000 -r -n 10 -P polling &
$ ./benchmarks/ucxx_perftest -s 1000000000 -r -n 10 -P polling localhost

CUDA Memory Support

When built with UCXX_BENCHMARKS_ENABLE_CUDA=ON, the benchmark supports multiple CUDA memory types using the -m flag:

# Server with CUDA device memory
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda -s 1048576 -r -n 10 -P polling &

# Client with CUDA device memory
$ ./benchmarks/ucxx_perftest -m cuda -s 1048576 -r -n 10 -P polling 127.0.0.1

# Server with CUDA managed memory (unified memory)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -r -n 10 -P polling &

# Client with CUDA managed memory
$ ./benchmarks/ucxx_perftest -m cuda-managed -s 1048576 -r -n 10 -P polling 127.0.0.1

# Server with CUDA async memory (with streams)
$ UCX_TCP_CM_REUSEADDR=y ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -r -n 10 -P polling &

# Client with CUDA async memory
$ ./benchmarks/ucxx_perftest -m cuda-async -s 1048576 -r -n 10 -P polling 127.0.0.1

Available Memory Types:

  • host - Standard host memory allocation (default)
  • cuda - CUDA device memory allocation
  • cuda-managed - CUDA unified/managed memory allocation
  • cuda-async - CUDA device memory with asynchronous operations

Requirements for CUDA Support:

  • UCXX compiled with UCXX_BENCHMARKS_ENABLE_CUDA=ON (if building benchmarks)
  • CUDA runtime available
  • UCX configured with CUDA transport support
  • Compatible CUDA devices on both endpoints

It is recommended to use UCX_TCP_CM_REUSEADDR=y when binding to interfaces with TCP support to prevent waiting for the process' TIME_WAIT state to complete, which often takes 60 seconds after the server has terminated.

Python

Benchmarks are available for both the Python "core" (synchronous) API and the "high-level" (asynchronous) API.

Synchronous

# Thread progress without delayed notification NumPy transfer, 100 iterations
# of single buffer with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100

# Blocking progress without delayed notification RMM transfer between GPUs 0
# and 3, 100 iterations of 2 buffers (using multi-buffer interface) each with
# 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-core \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 100 \
    --progress-mode blocking

Asynchronous

# NumPy transfer, 100 iterations of 8 buffers (using multi-buffer interface)
# each with 100 bytes
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 100 \
    --n-buffers 8

# RMM transfer between GPUs 0 and 3, 100 iterations of 2 buffers (using
# multi-buffer interface) each with 1 MiB
python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type rmm \
    --server-dev 0 \
    --client-dev 3 \
    --n-iter 100 \
    --n-bytes 1MiB \
    --n-buffers 2

# Polling progress mode without delayed notification NumPy transfer,
# 100 iterations of single buffer with 1 MiB
UCXPY_ENABLE_DELAYED_SUBMISSION=0 \
    python -m ucxx.benchmarks.send_recv \
    --backend ucxx-async \
    --object_type numpy \
    --n-iter 100 \
    --n-bytes 1MiB \
    --progress-mode polling

Logging

Logging is independently available for both C++ and Python APIs. Since the Python interface uses the C++ backend, C++ logging can be enabled when running Python code as well.

C++

The C++ interface reuses the UCX logger and provides the same log levels and can be enabled via the UCXX_LOG_LEVEL environment variable. However, it will not enable UCX logging, one must still set UCX_LOG_LEVEL for UCX logging. A few examples are below:

# Request trace log level
UCXX_LOG_LEVEL=TRACE_REQ

# Debug log level
UCXX_LOG_LEVEL=DEBUG

Python

The UCXX Python interface uses the logging library included in Python. The only used levels currently are INFO and DEBUG, and can be enabled via the UCXPY_LOG_LEVEL environment variable. A few examples are below:

# Enable Python info log level
UCXPY_LOG_LEVEL=INFO

# Enable Python debug log level, UCXX request trace log level and UCX data log level
UCXPY_LOG_LEVEL=DEBUG UCXX_LOG_LEVEL=TRACE_REQ UCX_LOG_LEVEL=DATA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (689.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (671.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (689.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (669.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (704.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (686.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (702.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl (683.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 79807a553d5ef2502d7bd4705b5eed15f5911c1e13aaec4a065c305094b9f7be
MD5 80fd8fc17208ad4d4802f91c7c77d36b
BLAKE2b-256 7efcb07383eb85b50f5e29057a5473ccfc47b44bc3d39194a0feae1eba654bef

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 92a13dd0d2732c1dee2dd9eefa669d63ae636ef59b6b834548160ee8994db1da
MD5 87df6e2af686c278e984b940bff6fae2
BLAKE2b-256 b4c147395fa21382f6b77344206805e787c981d03a04825a5de199437ddf8ad0

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a296efcccb194fb01a058e034c72ffa42bbbdc2d9f4796611ffc807e4e1c2ccb
MD5 09ab5f78896f95040f944a2fd8749363
BLAKE2b-256 31aab977a999dcff564f39cff5c0e3cdef032f7278771137ed761425a1e18640

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5489fad6de6b80ce972ac7ff6c94471d1d92705711acfca1c7e8e1c42e498d68
MD5 ceb5db6963b85057af3923226fe09a9e
BLAKE2b-256 4cfe4af356bc85e523ff3247dee83d3f5a71d0d322425987040f94c0301e337f

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9d43c96af55581c72884474e433ce71efe13f9679f8bead8e4886c43d07a0745
MD5 7b25fb966094f822fc8eb5600279319a
BLAKE2b-256 b517172348d1b87b7283b686a209668e160b682753639da4ef20ee07a2374a1e

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fb5ccffde47fda80365a2fc453583ae1595a1099cf17412ada1b777bd39c9dc5
MD5 23029236f15ec532c8b5f6e2c7421894
BLAKE2b-256 001d2a67813bd349533c929006b986d276398c15cfefa670ea6fe084d1f68858

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3a057a4a09a302c19907f64b68af14481dd49d53bdc13f583e27af3a13cd3bfe
MD5 e2ca9ca306a82c51963a18f5a61a62f3
BLAKE2b-256 2f704771a6acf2c528dc3d5d37b32f50e7987d10d8a2ae24549e52b855228396

See more details on using hashes here.

File details

Details for the file ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ucxx_cu12-0.45.1-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b67aba9c1d1aee0c4c68bdbd6c602ebfa6d851e2e81c58c262a2892a5a222d6b
MD5 d10cf49e4c65045725fbc5f40bdfa933
BLAKE2b-256 6026ec5d69abab95c8a00426515497a4fe74214384993e8612bb16b0cf307511

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page