Deep Learning Training Acceleration with Bagua and Lightning AI
Project description
Lightning ⚡ Bagua
Deep Learning Training Acceleration with Bagua and Lightning AI
Bagua is a deep learning training acceleration framework which supports multiple advanced distributed training algorithms including:
- Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
- Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
- ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
- Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.
Installation
pip install -U lightning-bagua
Usage
Simply set the strategy argument in the Trainer:
from lightning import Trainer
# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)
See Bagua Tutorials for more details on installation and advanced features.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lightning-bagua-0.1.0.tar.gz
(14.4 kB
view hashes)
Built Distribution
Close
Hashes for lightning_bagua-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cb1e0156be950c1859061c9586d606f9543c96c5e0bb8dd128c7075a4233295 |
|
MD5 | 627d8f8496791b2f517f3adf8c361920 |
|
BLAKE2b-256 | 6f4a33799300a1fa158e4545a6639325b91a26ddef5d336fa514089915905fd9 |