Skip to main content

Large DNNs training framework for consumer GPUs

Project description

RoundPipe Banner

High Performance · Easy to Use · Built for Gaming GPUs

PyPI Python License Code style: black Code style: clang-format

Documentation · 中文文档 · Benchmarks · Examples


RoundPipe is a large DNN training framework that lets you train huge models on consumer-grade GPUs. On a single 24 GB GPU, you can full fine-tune 32B-parameter models, LoRA fine-tune up to 235B, and handle 64K+ token sequences, with throughput approaching datacenter-class hardware.

Highlights

  • Train bigger than ever: Full fine-tune 32B models or LoRA fine-tune up to 235B on a single 24 GB GPU. Up to 7× longer sequence length than PyTorch FSDP.
  • High performance: Push a 4090 close to A800 NVLINK-class throughput. Up to 6× faster than FSDP Offload in typical workloads.
  • Linear multi-GPU scaling: Scale to multiple GPUs within a node without rewriting your training loop. Throughput grows linearly while max sequence length per GPU stays unchanged.
  • Feels like PyTorch: Sequential programming interface with a low learning curve. Works well in Jupyter Notebook for rapid iteration.
  • General by design: No constraints on layer structure, training flow, or parameter update strategy.
  • Portable across accelerators: Pure PyTorch implementation. Runs on Nvidia, AMD, and Ascend platforms.

Benchmarks

All benchmarks below are measured on a single node with 8 GPUs. "OOM" means the framework cannot fit the model under that configuration.

Maximum Input Sequence Length

Framework Qwen3-1.7B Llama3.1-8B Qwen3-32B Qwen3-235B (LoRA)
4090 · FSDP Offload 11 K 11 K OOM OOM
4090 · RoundPipe 73 K 49 K 28 K 31 K
A800 · FSDP 39 K 29 K 11 K OOM
A800 · RoundPipe 288 K 226 K 126 K 118 K

Training Throughput (tokens/s)

Framework Qwen3-1.7B Llama3.1-8B Qwen3-32B Qwen3-235B (LoRA)
4090 · FSDP Offload 35,074 4,071 OOM OOM
4090 · RoundPipe 65,417 24,275 5,516 1,820
A800 · FSDP 85,829 29,148 3,455 OOM
A800 · RoundPipe 84,692 28,427 6,301 1,796

Multi-GPU Scaling (8× RTX 4090)

GPUs Qwen3-1.7B Llama3.1-8B Qwen3-32B Qwen3-235B (LoRA)
1 8,881 3,142 740 480
2 17,026 6,259 1,476 808
4 33,178 12,278 2,897 1,281
8 65,417 24,275 5,516 1,820

Max sequence length per GPU stays constant across all GPU counts (73 K, 49 K, 28 K, and 31 K respectively).

Cross-Platform

Device Qwen3-1.7B Llama3.1-8B Qwen3-32B Qwen3-235B (LoRA)
AMD W7800 17,852 5,915 1,450 665
Ascend 910B 50,599 23,253 5,028 459
RTX 4090 65,417 24,275 5,516 1,820

Quick Start

Installation

pip install roundpipe

Requirements: Python ≥ 3.8, PyTorch ≥ 2.4

Examples

See the example/ directory. More examples and tutorials will be added soon.

Documentation

Full documentation is available at itcarrot.github.io/RoundPipe.

中文文档请访问 itcarrot.github.io/RoundPipe/index.zh.html

License

RoundPipe is licensed under the LGPL-3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roundpipe-0.1.0.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roundpipe-0.1.0-py3-none-any.whl (84.0 kB view details)

Uploaded Python 3

File details

Details for the file roundpipe-0.1.0.tar.gz.

File metadata

  • Download URL: roundpipe-0.1.0.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for roundpipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f9568f8b43a8d30186e9a406ca0ae3dd0531da308307ecbb4738cb9bc7bdfcb
MD5 852754f20d0b0e530084411c8aceebe7
BLAKE2b-256 2daf511999e13639298bb82ff242fe9bb70f3d4a9b2e998557194250b02b2da2

See more details on using hashes here.

Provenance

The following attestation bundles were made for roundpipe-0.1.0.tar.gz:

Publisher: release.yml on ITcarrot/RoundPipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file roundpipe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: roundpipe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 84.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for roundpipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bfc13bfd576ad73e5837dab210a1007bcdd357cd4141b99c8090e28a663c1507
MD5 ea6bd3a22453b9e75a6996004b7caeb8
BLAKE2b-256 823863e396f344e7945a5696cf0942ffb0b9c7c74b1c754aa0d1d15f6d94c9bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for roundpipe-0.1.0-py3-none-any.whl:

Publisher: release.yml on ITcarrot/RoundPipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page