DLSlime Transfer Engine

These details have not been verified by PyPI

Project links

License
- OSI Approved :: BSD License
Operating System
- POSIX :: Linux
- Unix
Programming Language
- Python :: 3

Project description

DLSlime Transfer Engine

A Peer to Peer RDMA Transfer Engine.

Usage

RDMA READ

Details in p2p.py

devices = available_nic()
assert devices, "No RDMA devices."

# Initialize RDMA endpoint
initiator = RDMAEndpoint(device_name=devices[0], ib_port=1, link_type="RoCE")
# Register local GPU memory with RDMA subsystem
local_tensor = torch.tensor(...)
initiator.register_memory_region("buffer", local_tensor...)

# Initialize target endpoint on different NIC
target = RDMAEndpoint(device_name=devices[-1], ib_port=1, link_type="RoCE")
# Register target's GPU memory
remote_tensor = torch.tensor(...)
target.register_memory_region("buffer", remote_tensor...)

# Establish bidirectional RDMA connection:
# 1. Target connects to initiator's endpoint information
# 2. Initiator connects to target's endpoint information
# Note: Real-world scenarios typically use out-of-band exchange (e.g., via TCP)
target.connect(initiator.local_endpoint_info)
initiator.connect(target.local_endpoint_info)

# Execute asynchronous batch read operation:
asyncio.run(initiator.async_read_batch("buffer", [0], [8], 8))

SendRecv

Details in sendrecv.py

Sender

# RDMA init and RDMA Connect just like RDMA Read
...

# RDMA Send
ones = torch.ones([16], dtype=torch.uint8)
endpoint.register_memory_region("buffer", ones.data_ptr(), 16)
asyncio.run(endpoint.send_async("buffer", 0, 8))

Receiver

# RDMA init and RDMA Connect just like RDMA Read
...

# RDMA Recv
zeros = torch.zeros([16], dtype=torch.uint8)
endpoint.register_memory_region("buffer", zeros.data_ptr(), 16)
asyncio.run(endpoint.recv_async("buffer", 8, 8))

Build

# on CentOS
sudo yum install cppzmq-devel gflags-devel  cmake

# on Ubuntu
sudo apt install libzmq-dev libgflags-dev cmake

# build from source
mkdir build; cd build
cmake -DBUILD_BENCH=ON -DBUILD_PYTHON=ON ..; make

Benchmark

# Target
./bench/transfer_bench                \
  --remote-endpoint=10.130.8.138:8000 \
  --local-endpoint=10.130.8.139:8000  \
  --device-name="mlx5_bond_0"         \
  --mode target                       \
  --block-size=2048000                \
  --batch-size=160

# Initiator
./bench/transfer_bench                \
  --remote-endpoint=10.130.8.139:8000 \
  --local-endpoint=10.130.8.138:8000  \
  --device-name="mlx5_bond_0"         \
  --mode initiator                    \
  --block-size=16384                  \
  --batch-size=16                     \
  --duration 10

Cross node performance

Single Device

NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe 5.0 x16 with x16 PCIe extension option;
RoCE v2.

Block Size	Batch Size	Total Trips	Total Transferred (MiB)	Duration (seconds)	Average Latency (ms/trip)	Throughput (MiB/s)
8192	160	249920	312400	10.0006	0.0400154	31238
16384	160	153300	383250	10.0012	0.0652392	38320.5
32768	160	85280	426400	10.0013	0.117276	42634.4
65536	160	44680	446800	10.0033	0.223887	44665.3
128000	160	23340	455859	10.0023	0.428546	45575.6
128000	160	23340	455859	10.0028	0.428571	45573
256000	160	11820	461718	10.0135	0.847166	46109.6
512000	160	5940	464062	10.002	1.68384	46396.8
1024000	160	2980	465625	10.0049	3.35735	46539.6
2048000	160	1500	468750	10.0555	6.70364	46616.5

Aggregated Transfer

NVIDIA ConnectX-7 HHHL Adapter Card * 8
RoCE v2

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: BSD License
Operating System
- POSIX :: Linux
- Unix
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.3rc2 pre-release

May 6, 2026

0.0.3rc1 pre-release

Mar 31, 2026

0.0.2.post1

Jan 6, 2026

0.0.2

Jan 5, 2026

0.0.2rc1 pre-release

Jan 5, 2026

0.0.1.post10

Sep 22, 2025

This version

0.0.1.post7

May 6, 2025

0.0.1.post6

Apr 29, 2025

0.0.1.post5

Apr 29, 2025

0.0.1.post4

Apr 26, 2025

0.0.1.post3

Apr 24, 2025

0.0.1.post2

Apr 8, 2025

0.0.1.post1

Apr 3, 2025

0.0.1

Apr 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlslime-0.0.1.post7.tar.gz (169.3 kB view details)

Uploaded May 6, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl (243.2 kB view details)

Uploaded May 6, 2025 CPython 3.12

dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl (291.4 kB view details)

Uploaded May 15, 2025 CPython 3.10

File details

Details for the file dlslime-0.0.1.post7.tar.gz.

File metadata

Download URL: dlslime-0.0.1.post7.tar.gz
Upload date: May 6, 2025
Size: 169.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dlslime-0.0.1.post7.tar.gz
Algorithm	Hash digest
SHA256	`357802cec6be31bad198a0648e3dc092915be04e6d546e7154bd4567186b6b60`
MD5	`88c9e0d2e1e31feeedf1996436ad91fd`
BLAKE2b-256	`d231ee132e04babdde2905d688bda797aec5d41e9fbcd31270ca162fe67ef0b4`

See more details on using hashes here.

File details

Details for the file dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

Download URL: dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl
Upload date: May 6, 2025
Size: 243.2 kB
Tags: CPython 3.12
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dlslime-0.0.1.post7-cp312-cp312-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`1db1a4aa4394c8bb1bc99e7eeee6a56be84cbcc8e00a35da361ca4ee22d64713`
MD5	`2d65b87fe502069959dc5cfd05338ad5`
BLAKE2b-256	`c3f57c1683652ca068620dac8385e612fcb2ccd0fa8ebbf0988e5ad9d30cdb57`

See more details on using hashes here.

File details

Details for the file dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

Download URL: dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl
Upload date: May 15, 2025
Size: 291.4 kB
Tags: CPython 3.10
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dlslime-0.0.1.post7-cp310-cp310-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`90ccef746004eedd208e9833bb1461b5f55f40af8c43fc9a47d869a9d8381481`
MD5	`d88f9d7bb954cc04db03096695828914`
BLAKE2b-256	`4ddca6141986f5fad12b406dacc4fecf26f0754ed4cb5ae106a3808c2e4b78cf`

See more details on using hashes here.

dlslime 0.0.1.post7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DLSlime Transfer Engine

Usage

RDMA READ

SendRecv

Sender

Receiver

Build

Benchmark

Cross node performance

Single Device

Aggregated Transfer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes