Decouple Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)
Project description
Decoupled Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)
Installation
Installation from PyPI:
pip install detonation
Installation from source:
git clone https://github.com/schneiderkamplab/DeToNATION
cd DeToNATION
pip install .
Usage
The usage requires three elements as exemplified below for using the FlexDeMo optimizer.
First, you need to wrap your model with FSDP and the hybrid sharding strategy:
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
model = FSDP(
model,
sharding_strategy=ShardingStrategy.HYBRID_SHARD,
)
Then, you can import and instantiate the FlexDeMo optimizer:
from detonation import DeMo
optim = DeMo(
compression_topk=16,
compression_chunk=128,
sharding_parallel_group=model.process_group,
replication_parallel_group=model._inter_node_pg,
)
Third and last, you need to wrap the forward and backward pass using a
no_sync context manager to avoid automatic full gradient synchronization:
with model.no_sync(): # Disable gradient synchronizations across FSDP instances.
loss = model(input_ids=batch["input_ids"],labels=batch["labels"])["loss"]
loss.backward()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
detonation-0.2.0.tar.gz
(7.7 kB
view details)
File details
Details for the file detonation-0.2.0.tar.gz.
File metadata
- Download URL: detonation-0.2.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0264cb221bb5fba5ff69d8fc67c287105aac2e3e35b15da42483bdcc70ef3ac4
|
|
| MD5 |
d8fc49358f7ba907d66a51d8da775c30
|
|
| BLAKE2b-256 |
b5056928bad75bcd14b426fe4859f5298353e4571b2b7fd33c0925751d9c3be4
|