A high-performance framework for fine-tuning large language models with multi-GPU support
Project description
OpenSloth 🦥⚡
Scale Unsloth to multiple GPUs with just torchrun. No configuration files, no custom frameworks - pure PyTorch DDP.
- 🚀 2-4x faster than single GPU
- 🎯 Zero configuration - works out of the box
- 💾 Same VRAM per GPU as single GPU Unsloth
- 🔧 Any Unsloth model - Qwen, Llama, Gemma, etc.
Installation
# Install dependencies
uv add torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
uv add unsloth datasets transformers trl
uv add git+https://github.com/anhvth/opensloth.git
Quick Start
Replace python with torchrun:
# Single GPU
python train_scripts/train_ddp.py
# Multi-GPU
torchrun --nproc_per_node=2 train_scripts/train_ddp.py # 2 GPUs
torchrun --nproc_per_node=4 train_scripts/train_ddp.py # 4 GPUs
OpenSloth automatically handles GPU distribution, gradient sync, and batch sizing.
Performance
| Setup | Time | Speedup |
|---|---|---|
| 1 GPU | 19m 34s | 1.0x |
| 2 GPUs | 8m 28s | 2.3x |
Expected scaling: 2 GPUs = ~2.3x, 4 GPUs = ~4.5x, 8 GPUs = ~9x
Usage
from unsloth import FastLanguageModel
from trl import SFTConfig, SFTTrainer
from opensloth.patching.ddp_patch import ddp_patch
ddp_patch() # Enable DDP compatibility
# Standard Unsloth setup
local_rank = int(os.environ.get("LOCAL_RANK", 0))
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Qwen3-0.6B",
device_map={"": local_rank},
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(model, r=16)
trainer = SFTTrainer(model=model, tokenizer=tokenizer, ...)
trainer.train()
Run: torchrun --nproc_per_node=4 your_script.py
Migration from Old Approach
Current (Recommended): Simple torchrun + DDP patch
from opensloth.patching.ddp_patch import ddp_patch
ddp_patch()
# ... standard Unsloth code
Old Approach (v0.1.8): For complex configuration files, use:
git checkout https://github.com/anhvth/opensloth/releases/tag/v0.1.8
Links
- Unsloth - 2x faster training library
- TRL - Transformer Reinforcement Learning
- PyTorch DDP - Distributed training
git clone https://github.com/anhvth/opensloth.git
cd opensloth
torchrun --nproc_per_node=4 train_scripts/train_ddp.py
Happy training! 🦥⚡
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opensloth-0.2.1.tar.gz.
File metadata
- Download URL: opensloth-0.2.1.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.12.11 Linux/5.4.0-216-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
751b124a5b18c1fb8082c59174fb2a1fd246337e6c837d7a7ae39aa710726397
|
|
| MD5 |
d4ffbc1c706f6ac1a56feb29aad8b5cd
|
|
| BLAKE2b-256 |
0af85db26144a96ceb08b3e85aaac711b0bcb9efe8c548d7f577f94f53770b1b
|
File details
Details for the file opensloth-0.2.1-py3-none-any.whl.
File metadata
- Download URL: opensloth-0.2.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.12.11 Linux/5.4.0-216-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bd0dd918add6324c15be5b4acb7cc7964e761fc4a691774cd5fba8c1f0255e9
|
|
| MD5 |
dfa02881794218e14b3998cea02b533b
|
|
| BLAKE2b-256 |
9178debf3a8a5b1f2207aa4a756b1554ba8a2b2606e311dd083645eafc198d6f
|