Deep Learning Training Acceleration with Bagua and Lightning AI
Project description
Lightning ⚡ Bagua
Deep Learning Training Acceleration with Bagua and Lightning AI
Bagua is a deep learning training acceleration framework which supports multiple advanced distributed training algorithms including:
- Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
- Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
- ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
- Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.
Installation
pip install -U lightning-bagua
Usage
Simply set the strategy argument in the Trainer:
from lightning import Trainer
# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)
By specifying the algorithm
in the BaguaStrategy
, you can select more advanced training algorithms featured by Bagua:
from lightning import Trainer
from lightning_bagua import BaguaStrategy
# train on 4 GPUs, using Bagua Gradient AllReduce algorithm
trainer = Trainer(
strategy=BaguaStrategy(algorithm="gradient_allreduce"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua ByteGrad algorithm
trainer = Trainer(
strategy=BaguaStrategy(algorithm="bytegrad"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua Decentralized SGD
trainer = Trainer(
strategy=BaguaStrategy(algorithm="decentralized"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Bagua Low Precision Decentralized SGD
trainer = Trainer(
strategy=BaguaStrategy(algorithm="low_precision_decentralized"),
accelerator="gpu",
devices=4,
)
# train on 4 GPUs, using Asynchronous Model Average algorithm, with a synchronization interval of 100ms
trainer = Trainer(
strategy=BaguaStrategy(algorithm="async", sync_interval_ms=100),
accelerator="gpu",
devices=4,
)
To use QAdam, we need to initialize QAdamOptimizer first:
import lightning as L
from lightning_bagua import BaguaStrategy
from bagua.torch_api.algorithms.q_adam import QAdamOptimizer
class MyModel(L.LightningModule):
...
def configure_optimizers(self):
# initialize QAdam Optimizer
return QAdamOptimizer(self.parameters(), lr=0.05, warmup_steps=100)
model = MyModel()
trainer = L.Trainer(
accelerator="gpu",
devices=4,
strategy=BaguaStrategy(algorithm="qadam"),
)
trainer.fit(model)
Bagua relies on its own launcher to schedule jobs. Below, find examples using bagua.distributed.launch
which follows torch.distributed.launch
API:
# start training with 8 GPUs on a single node
python -m bagua.distributed.launch --nproc_per_node=8 train.py
If the ssh service is available with passwordless login on each node, you can launch the distributed job on a single node with baguarun
which has a similar syntax as mpirun
. When staring the job, baguarun
will automatically spawn new processes on each of your training node provided by --host_list
option and each node in it is described as an ip address followed by a ssh port.
# Run on node1 (or node2) to start training on two nodes (node1 and node2), 8 GPUs per node
baguarun --host_list hostname1:ssh_port1,hostname2:ssh_port2 --nproc_per_node=8 --master_port=port1 train.py
Note
You can also start training in the same way as Distributed Data Parallel. However, system optimizations like Bagua-Net and Performance autotuning can only be enabled through bagua launcher. It is worth noting that with Bagua-Net
, Distributed Data Parallel can also achieve better performance without modifying the training script.
See Bagua Tutorials for more details on installation and advanced features.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lightning_bagua-0.1.0rc2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eeefd39d3c87240a1ae474ca28beecdcf55ff7e80150a383521b3c7e071be553 |
|
MD5 | 6a704c2ea2e383adb4ffe7e97bfdf8b9 |
|
BLAKE2b-256 | 5081b5f8090eba091764b80336860df59e2c22ee16f22883e4ff31cfda8d994e |