Skip to main content

NCCL device-initiated API for the CuTe Python DSL

Project description

CuTe-comm

NCCL device-initiated API bindings for the CuTe Python DSL.

Exposes NCCL's GIN (GPU-initiated networking) and LSA (local symmetric access) APIs via cute.ffi, with a thin host-side layer for communicator setup.

Requirements

  • Python >= 3.10, CUDA 13, NCCL >= 2.29.7
  • GPU with deviceApiSupport (check via query_properties)

Installation

pip install cute-comm
# or from source
uv sync

Usage

import cute_comm.nccl as cn

# Host: bootstrap communicator, register memory
uid = cn.get_unique_id()          # broadcast from rank 0
comm = cn.init_rank(nranks, rank, uid)
dev_comm = cn.create_dev_comm(comm, gin_contexts=1, gin_signals=1, lsa_barriers=1)
win = cn.register_window(comm, buf.data_ptr(), buf.nbytes)

# Device: use inside @cute.jit kernels
@cute.jit
def my_kernel(dev_comm, ctx, win, peer):
    ptr = cn.lsa_ptr(win, offset, peer)
    cn.gin_put_siginc(dev_comm, ctx, peer, dst_win, dst_off, src_win, src_off, nbytes, signal_idx)
    cn.gin_wait_signal(dev_comm, ctx, signal_idx, threshold=1)

# Pass link_options to bundle the bridge PTX (compiled lazily, cached in ~/.cache/cute_comm/)
my_kernel[grid, block, cn.link_options("sm_90a")](dev_comm, ctx, win, peer)

# Cleanup
cn.deregister_window(comm, win)
cn.destroy_dev_comm(comm, dev_comm)
cn.finalize(comm)
cn.destroy(comm)

API

Hostget_unique_id, init_rank, query_properties, create_dev_comm, destroy_dev_comm, register_window, deregister_window, finalize, destroy

Device (inside @cute.jit) — local_ptr, lsa_ptr, lsa_barrier, gin_put, gin_put_siginc, gin_put_sigadd, gin_read_signal, gin_wait_signal

License

BSD 3-Clause. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cute_comm-0.1.0.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cute_comm-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file cute_comm-0.1.0.tar.gz.

File metadata

  • Download URL: cute_comm-0.1.0.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cute_comm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62188030516ebb288c31e7b7da42871aa3fa02523b305ab8d47f1b76cfd4ef98
MD5 3d4e77f8ebd96324e5037b050faf60d6
BLAKE2b-256 581a1e7ff2e65ea554b26daa905d25acab4cba20b49f34c279af579241d0f355

See more details on using hashes here.

File details

Details for the file cute_comm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cute_comm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cute_comm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25dc967f6094e9cf02bd741770ac1ee07ac1f22099425ff38646bbffe2506dc2
MD5 71b1dacc51e6eda4ce01d454faf1bc0a
BLAKE2b-256 fba9b8859e96c608433fd5adc94abe7a0257ef1d78f127ef11574d8d6fecc080

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page