DLSlime Transfer Engine
Project description
DLSlime Transfer Engine
A Peer to Peer RDMA Transfer Engine.
Usage
RDMA READ
- Details in p2p.py
devices = available_nic()
assert devices, "No RDMA devices."
# Initialize RDMA endpoint
initiator = RDMAEndpoint(device_name=devices[0], ib_port=1, link_type="RoCE")
# Register local GPU memory with RDMA subsystem
local_tensor = torch.tensor(...)
initiator.register_memory_region("buffer", local_tensor...)
# Initialize target endpoint on different NIC
target = RDMAEndpoint(device_name=devices[-1], ib_port=1, link_type="RoCE")
# Register target's GPU memory
remote_tensor = torch.tensor(...)
target.register_memory_region("buffer", remote_tensor...)
# Establish bidirectional RDMA connection:
# 1. Target connects to initiator's endpoint information
# 2. Initiator connects to target's endpoint information
# Note: Real-world scenarios typically use out-of-band exchange (e.g., via TCP)
target.connect(initiator.local_endpoint_info)
initiator.connect(target.local_endpoint_info)
# Execute asynchronous batch read operation:
asyncio.run(initiator.async_read_batch("buffer", [0], [8], 8))
SendRecv
- Details in sendrecv.py
Sender
# RDMA init and RDMA Connect just like RDMA Read
...
# RDMA Send
ones = torch.ones([16], dtype=torch.uint8)
endpoint.register_memory_region("buffer", ones.data_ptr(), 16)
asyncio.run(endpoint.send_async("buffer", 0, 8))
Receiver
# RDMA init and RDMA Connect just like RDMA Read
...
# RDMA Recv
zeros = torch.zeros([16], dtype=torch.uint8)
endpoint.register_memory_region("buffer", zeros.data_ptr(), 16)
asyncio.run(endpoint.recv_async("buffer", 8, 8))
Build
# on CentOS
sudo yum install cppzmq-devel gflags-devel cmake
# on Ubuntu
sudo apt install libzmq-dev libgflags-dev cmake
# build from source
mkdir build; cd build
cmake -DBUILD_BENCH=ON -DBUILD_PYTHON=ON ..; make
Benchmark
# Target
./bench/transfer_bench \
--remote-endpoint=10.130.8.138:8000 \
--local-endpoint=10.130.8.139:8000 \
--device-name="mlx5_bond_0" \
--mode target \
--block-size=2048000 \
--batch-size=160
# Initiator
./bench/transfer_bench \
--remote-endpoint=10.130.8.139:8000 \
--local-endpoint=10.130.8.138:8000 \
--device-name="mlx5_bond_0" \
--mode initiator \
--block-size=16384 \
--batch-size=16 \
--duration 10
Cross node performance
Single Device
- NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe 5.0 x16 with x16 PCIe extension option;
- RoCE v2.
| Block Size | Batch Size | Total Trips | Total Transferred (MiB) | Duration (seconds) | Average Latency (ms/trip) | Throughput (MiB/s) |
|---|---|---|---|---|---|---|
| 8192 | 160 | 249920 | 312400 | 10.0006 | 0.0400154 | 31238 |
| 16384 | 160 | 153300 | 383250 | 10.0012 | 0.0652392 | 38320.5 |
| 32768 | 160 | 85280 | 426400 | 10.0013 | 0.117276 | 42634.4 |
| 65536 | 160 | 44680 | 446800 | 10.0033 | 0.223887 | 44665.3 |
| 128000 | 160 | 23340 | 455859 | 10.0023 | 0.428546 | 45575.6 |
| 128000 | 160 | 23340 | 455859 | 10.0028 | 0.428571 | 45573 |
| 256000 | 160 | 11820 | 461718 | 10.0135 | 0.847166 | 46109.6 |
| 512000 | 160 | 5940 | 464062 | 10.002 | 1.68384 | 46396.8 |
| 1024000 | 160 | 2980 | 465625 | 10.0049 | 3.35735 | 46539.6 |
| 2048000 | 160 | 1500 | 468750 | 10.0555 | 6.70364 | 46616.5 |
Aggregated Transfer
- NVIDIA ConnectX-7 HHHL Adapter Card * 8
- RoCE v2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dlslime-0.0.1.post7.tar.gz
(169.3 kB
view details)
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dlslime-0.0.1.post7.tar.gz.
File metadata
- Download URL: dlslime-0.0.1.post7.tar.gz
- Upload date:
- Size: 169.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
357802cec6be31bad198a0648e3dc092915be04e6d546e7154bd4567186b6b60
|
|
| MD5 |
88c9e0d2e1e31feeedf1996436ad91fd
|
|
| BLAKE2b-256 |
d231ee132e04babdde2905d688bda797aec5d41e9fbcd31270ca162fe67ef0b4
|
File details
Details for the file dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl
- Upload date:
- Size: 243.2 kB
- Tags: CPython 3.12
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1db1a4aa4394c8bb1bc99e7eeee6a56be84cbcc8e00a35da361ca4ee22d64713
|
|
| MD5 |
2d65b87fe502069959dc5cfd05338ad5
|
|
| BLAKE2b-256 |
c3f57c1683652ca068620dac8385e612fcb2ccd0fa8ebbf0988e5ad9d30cdb57
|
File details
Details for the file dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl
- Upload date:
- Size: 291.4 kB
- Tags: CPython 3.10
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90ccef746004eedd208e9833bb1461b5f55f40af8c43fc9a47d869a9d8381481
|
|
| MD5 |
d88f9d7bb954cc04db03096695828914
|
|
| BLAKE2b-256 |
4ddca6141986f5fad12b406dacc4fecf26f0754ed4cb5ae106a3808c2e4b78cf
|