DLSlime Transfer Engine
Project description
Roadmap |
Slack |
WeChat Group |
Zhihu
Flexible & Efficient Heterogeneous Transfer Toolkit
DLSlime is a heterogeneous transfer toolkit for distributed deep learning systems. It provides Python and C++ APIs for peer-to-peer data movement across RDMA, NVLink, Ascend Direct, and torch-distributed style backends, with higher level PeerAgent, SlimeRPC, and cache-service utilities built on top of the same data plane.
Highlights
| Area | What DLSlime Provides |
|---|---|
| P2P transfer | RDMA RC read/write/send-recv, NVLink transfer, Ascend Direct transfer |
| Python control | RDMAEndpoint, PeerAgent, memory-region registration, async futures |
| RPC | SlimeRPC service/proxy helpers over PeerAgent mailbox transport |
| Cache service | dlslime-cache service for assignment-directory backed RDMA cache slabs |
| Torch integration | Optional torch backend and torchrun examples |
| Benchmarks | Transfer, endpoint, cache, and RPC microbenchmarks under bench/ |
Install
From PyPI
pip install dlslime==0.0.3.rc2
The PyPI package is built with the default CMake flags. Build from source when you need optional transports or local C++ changes.
From Source
git clone https://github.com/deeplink-org/DLSlime.git
cd DLSlime
pip install -v --no-build-isolation -e .
Pass CMake flags through the environment when enabling optional components:
BUILD_NVLINK=ON BUILD_TORCH_PLUGIN=ON \
pip install -v --no-build-isolation -e .
For a pure C++ build:
cmake -S . -B build -GNinja -DBUILD_PYTHON=OFF -DBUILD_RDMA=ON
cmake --build build
Build Flags
| Flag | Default | Description |
|---|---|---|
BUILD_RDMA |
ON |
Build the RDMA transfer engine |
BUILD_PYTHON |
OFF in CMake, ON in pyproject.toml |
Build Python bindings |
BUILD_NVLINK |
OFF |
Build the NVLink transfer engine |
BUILD_ASCEND_DIRECT |
OFF |
Build Ascend Direct transport |
BUILD_TORCH_PLUGIN |
OFF |
Build DLSlime as a torch backend |
BUILD_BENCH |
OFF |
Build C++ transfer-engine benchmarks |
BUILD_TEST |
OFF |
Build C++ tests |
USE_MACA |
OFF |
Enable Metax platform support for torch backend builds |
Quick Start
RDMA Endpoint
The low-level endpoint API registers local memory regions, exchanges endpoint metadata out of band, connects peers, and issues RDMA operations.
python examples/python/p2p_rdma_rc_read.py
python examples/python/p2p_rdma_rc_write.py
python examples/python/p2p_rdma_rc_write_with_imm_data.py
python examples/python/p2p_rdma_rc_send_recv_gdr.py
The examples use available_nic() to find RDMA devices and RDMAEndpoint to
register local and remote memory regions.
PeerAgent and SlimeRPC
PeerAgent adds control-plane based discovery and connection management. Start a NanoCtrl instance first, then run the RPC example:
nanoctrl start
python examples/python/rpc_example.py --ctrl http://127.0.0.1:3000
The example defines a Python service, serves it on a worker PeerAgent, and calls it from a driver PeerAgent through the SlimeRPC proxy API.
DLSlimeCache
DLSlimeCache owns a preallocated memory region and stores assignment manifests so clients can write bytes into cache slabs and read them back through normal RDMA operations.
nanoctrl start
dlslime-cache start --ctrl http://127.0.0.1:3000 \
--host 127.0.0.1 --port 8765 --memory-size 1G
python examples/python/cache_client_example.py --url http://127.0.0.1:8765
dlslime-cache stop
See docs/design/dlslime-cache.md for the cache service design and API.
NVLink and Ascend
torchrun --nproc_per_node=2 examples/python/p2p_nvlink.py
python examples/python/p2p_ascend_read.py
Ascend Direct setup details live in docs/huawei_ascend/README.md.
Benchmarks
Benchmark commands and historical performance tables now live under the benchmark directory:
- bench/README.md - transfer, endpoint, cache, and RPC benchmark entry point
- docs/benchmark-rpc.md - focused SlimeRPC vs Ray benchmark guide
Common entry points:
# Aggregated RDMA transfer benchmark, two nodes
torchrun --master-addr <addr> --master-port 6006 \
--nnodes 2 --nproc-per-node 8 --node-rank <rank> \
bench/python/agg_transfer_bench_spmd.py \
--qp-num 8 --transfer-engine dlslime \
--batch-size 64 --num-iteration 100 --num-concurrency 8
# SlimeRPC vs Ray local benchmark
bash bench/python/run_rpc_bench.sh
Repository Layout
dlslime/ Python package and C++ sources
examples/python/ Runnable Python examples
bench/ Benchmark scripts, result files, and benchmark README
docs/ Design notes, roadmap, platform docs, and benchmark notes
tests/ Python and C++ tests
cmake/ CMake helper modules
scripts/ Development scripts
Documentation
- Documentation index
- Roadmap
- DLSlimeCache design
- Endpoint ownership model
- Endpoint DeviceSignal refactor
- Huawei Ascend guide
- Chinese README
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dlslime-0.0.3rc2-cp313-cp313-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp313-cp313-manylinux2014_x86_64.whl
- Upload date:
- Size: 973.6 kB
- Tags: CPython 3.13
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
943af4e36ff3b52f3156f96ca749926b03d84937ca3a79c6a08af3b6b980aa2c
|
|
| MD5 |
c749593e00f4425eae16db644dad5298
|
|
| BLAKE2b-256 |
c9761df1e3c6687e806b8001244978b3312f129d64b058c7cb647f2067839eef
|
File details
Details for the file dlslime-0.0.3rc2-cp312-cp312-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp312-cp312-manylinux2014_x86_64.whl
- Upload date:
- Size: 972.8 kB
- Tags: CPython 3.12
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f03acfe801204e3316f0fc2c59c25bb59e1009eeea7804dddbaa2c46d5dbb06
|
|
| MD5 |
483a7de31ee1cdbc11bba5ffce00b141
|
|
| BLAKE2b-256 |
245aefe72453b14cd7fdd11cff81dbe4356a1f1a2f2504b0782870bf15dd4dd2
|
File details
Details for the file dlslime-0.0.3rc2-cp311-cp311-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp311-cp311-manylinux2014_x86_64.whl
- Upload date:
- Size: 974.0 kB
- Tags: CPython 3.11
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
780f1f9f561a38f820a1d4b53e343cb1bc7f2718b63a6e8735cb1aeb97dda117
|
|
| MD5 |
bee40e0529fbe62db7e11924e473bba3
|
|
| BLAKE2b-256 |
8546f7641634ee616af3f884cb3cfe61ea442adbae7940dbe5cf3dde37cd813e
|
File details
Details for the file dlslime-0.0.3rc2-cp310-cp310-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp310-cp310-manylinux2014_x86_64.whl
- Upload date:
- Size: 973.0 kB
- Tags: CPython 3.10
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c7dc9b07151be567421b38cf3ff1dc84e580b3eba8e852199f12dd4c2e27a0d
|
|
| MD5 |
eb636baf81d8621eacccc056a4447245
|
|
| BLAKE2b-256 |
8d1cb432e6ac1e849dbde07cb760154e1bd3612e88ab28fbaae07489d63c6a16
|
File details
Details for the file dlslime-0.0.3rc2-cp39-cp39-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp39-cp39-manylinux2014_x86_64.whl
- Upload date:
- Size: 973.2 kB
- Tags: CPython 3.9
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6d137ee7eb5b45a785ae04bebf1fb0f1662fbb9a3898bd50dfa55cafa3acf75
|
|
| MD5 |
9456e261c673eb9810d385877bb2e606
|
|
| BLAKE2b-256 |
87ba642bde016de3978676937d9e0531735599fdbee36eac89fdc44a25a1f88c
|
File details
Details for the file dlslime-0.0.3rc2-cp38-cp38-manylinux2014_x86_64.whl.
File metadata
- Download URL: dlslime-0.0.3rc2-cp38-cp38-manylinux2014_x86_64.whl
- Upload date:
- Size: 972.9 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
609ce65eff618de0a960fad611b05569fd739798b7d3e687b1a6fb0935a20fb1
|
|
| MD5 |
f5a002044e764011e98135c1b40171c8
|
|
| BLAKE2b-256 |
975a1a74b94a2137277ec5ccdf9c29891d5574f9bf72e983dbeded2a5d86200b
|