Large DNNs training framework for consumer GPUs
Project description
High Performance · Easy to Use · Built for Gaming GPUs
Documentation · 中文文档 · Benchmarks · Examples
RoundPipe is a large DNN training framework that lets you train huge models on consumer-grade GPUs. On a single 24 GB GPU, you can full fine-tune 32B-parameter models, LoRA fine-tune up to 235B, and handle 64K+ token sequences, with throughput approaching datacenter-class hardware.
Highlights
- Train bigger than ever: Full fine-tune 32B models or LoRA fine-tune up to 235B on a single 24 GB GPU. Up to 7× longer sequence length than PyTorch FSDP.
- High performance: Push a 4090 close to A800 NVLINK-class throughput. Up to 6× faster than FSDP Offload in typical workloads.
- Linear multi-GPU scaling: Scale to multiple GPUs within a node without rewriting your training loop. Throughput grows linearly while max sequence length per GPU stays unchanged.
- Feels like PyTorch: Sequential programming interface with a low learning curve. Works well in Jupyter Notebook for rapid iteration.
- General by design: No constraints on layer structure, training flow, or parameter update strategy.
- Portable across accelerators: Pure PyTorch implementation. Runs on Nvidia, AMD, and Ascend platforms.
Benchmarks
All benchmarks below are measured on a single node with 8 GPUs. "OOM" means the framework cannot fit the model under that configuration.
Maximum Input Sequence Length
| Framework | Qwen3-1.7B | Llama3.1-8B | Qwen3-32B | Qwen3-235B (LoRA) |
|---|---|---|---|---|
| 4090 · FSDP Offload | 11 K | 11 K | OOM | OOM |
| 4090 · RoundPipe | 73 K | 49 K | 28 K | 31 K |
| A800 · FSDP | 39 K | 29 K | 11 K | OOM |
| A800 · RoundPipe | 288 K | 226 K | 126 K | 118 K |
Training Throughput (tokens/s)
| Framework | Qwen3-1.7B | Llama3.1-8B | Qwen3-32B | Qwen3-235B (LoRA) |
|---|---|---|---|---|
| 4090 · FSDP Offload | 35,074 | 4,071 | OOM | OOM |
| 4090 · RoundPipe | 65,417 | 24,275 | 5,516 | 1,820 |
| A800 · FSDP | 85,829 | 29,148 | 3,455 | OOM |
| A800 · RoundPipe | 84,692 | 28,427 | 6,301 | 1,796 |
Multi-GPU Scaling (8× RTX 4090)
| GPUs | Qwen3-1.7B | Llama3.1-8B | Qwen3-32B | Qwen3-235B (LoRA) |
|---|---|---|---|---|
| 1 | 8,881 | 3,142 | 740 | 480 |
| 2 | 17,026 | 6,259 | 1,476 | 808 |
| 4 | 33,178 | 12,278 | 2,897 | 1,281 |
| 8 | 65,417 | 24,275 | 5,516 | 1,820 |
Max sequence length per GPU stays constant across all GPU counts (73 K, 49 K, 28 K, and 31 K respectively).
Cross-Platform
| Device | Qwen3-1.7B | Llama3.1-8B | Qwen3-32B | Qwen3-235B (LoRA) |
|---|---|---|---|---|
| AMD W7800 | 17,852 | 5,915 | 1,450 | 665 |
| Ascend 910B | 50,599 | 23,253 | 5,028 | 459 |
| RTX 4090 | 65,417 | 24,275 | 5,516 | 1,820 |
Quick Start
Installation
pip install roundpipe
Requirements: Python ≥ 3.8, PyTorch ≥ 2.4
Examples
See the example/ directory. More examples and tutorials will be added soon.
Documentation
Full documentation is available at itcarrot.github.io/RoundPipe.
中文文档请访问 itcarrot.github.io/RoundPipe/index.zh.html。
License
RoundPipe is licensed under the LGPL-3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file roundpipe-0.1.1.tar.gz.
File metadata
- Download URL: roundpipe-0.1.1.tar.gz
- Upload date:
- Size: 75.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df801a36e208de63449833937aede937b1b19fd9693eebc57a494068a4382754
|
|
| MD5 |
d66fca26640473a698d071a925e7adcd
|
|
| BLAKE2b-256 |
7011c3b9fb3800750478f54978981f3fc1ecd1d62ac36fd754804a2e218f5431
|
Provenance
The following attestation bundles were made for roundpipe-0.1.1.tar.gz:
Publisher:
release.yml on ITcarrot/RoundPipe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
roundpipe-0.1.1.tar.gz -
Subject digest:
df801a36e208de63449833937aede937b1b19fd9693eebc57a494068a4382754 - Sigstore transparency entry: 1078554425
- Sigstore integration time:
-
Permalink:
ITcarrot/RoundPipe@02bfc0e11b0ceb65d9e6d63c5a9f03fe9e1e35ed -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ITcarrot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@02bfc0e11b0ceb65d9e6d63c5a9f03fe9e1e35ed -
Trigger Event:
release
-
Statement type:
File details
Details for the file roundpipe-0.1.1-py3-none-any.whl.
File metadata
- Download URL: roundpipe-0.1.1-py3-none-any.whl
- Upload date:
- Size: 87.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88e0293423048e10a37438e60a2c5a2cf3a7f5065a395f40c33e608eb086fc4b
|
|
| MD5 |
609ad2b01bd19d531859d9e5ea234780
|
|
| BLAKE2b-256 |
93c24208ad55d2537e79951206ce9a91919874fe997c912e58e24ba43b10a39b
|
Provenance
The following attestation bundles were made for roundpipe-0.1.1-py3-none-any.whl:
Publisher:
release.yml on ITcarrot/RoundPipe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
roundpipe-0.1.1-py3-none-any.whl -
Subject digest:
88e0293423048e10a37438e60a2c5a2cf3a7f5065a395f40c33e608eb086fc4b - Sigstore transparency entry: 1078554440
- Sigstore integration time:
-
Permalink:
ITcarrot/RoundPipe@02bfc0e11b0ceb65d9e6d63c5a9f03fe9e1e35ed -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ITcarrot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@02bfc0e11b0ceb65d9e6d63c5a9f03fe9e1e35ed -
Trigger Event:
release
-
Statement type: