Decouple Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Decoupled Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

This code currently implements the results described in FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training. An implementation to run all experiments from the paper is found in the benchmarks folder.

Installation

Installation from PyPI:

pip install detonation

Installation from source:

git clone https://github.com/schneiderkamplab/DeToNATION
cd DeToNATION
pip install .

Example

There is a a full example for language model training using FlexDeMo in the example folder. Please refer to the documentation:

examples/t5/README.md

This example demonstrates the use of the prepare_detonation function for obtaining a distributed model and optimizer.

Benchmarks

There is a a full benchmarking example for language model training using FlexDeMo in the benchmarks folder. Please refer to the documentation:

benchmarks/t5/README.md

This benchmarking example demonstrates the use of the prepare_detonation function for obtaining a distributed model and optimizer, and uses aim and mltiming to track model parameters and performance.

Usage

The direct usage of DeToNATION without using prepare_detonation requires three elements as exemplified below for the FlexDeMo optimizer, i.e., DeToNATION with node-based hybrid sharding using DeMo replication.

First, you need to wrap your model with FSDP and the hybrid sharding strategy:

from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
model = FSDP(
    model,
    sharding_strategy=ShardingStrategy.HYBRID_SHARD,
)

Then, you can import and instantiate the FlexDeMo optimizer:

from detonation import DeMo
optim = DeMo(
    compression_topk=16,
    compression_chunk=128,
    sharding_parallel_group=model.process_group,
    replication_parallel_group=model._inter_node_pg,
)

Third and last, you need to wrap the forward and backward pass using a no_sync context manager to avoid automatic full gradient synchronization:

    with model.no_sync(): # Disable gradient synchronizations across FSDP instances.
        loss = model(input_ids=batch["input_ids"],labels=batch["labels"])["loss"]
        loss.backward()

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.5.2

May 26, 2025

0.5.1

May 23, 2025

0.5.0 yanked

May 23, 2025

Reason this release was yanked:

Discovered bug

0.4.0b0 pre-release

May 1, 2025

0.3.0

Apr 10, 2025

This version

0.2.2

Mar 18, 2025

0.2.1

Feb 12, 2025

0.2.0

Feb 12, 2025

0.1.1

Feb 11, 2025

0.1.0

Feb 10, 2025

0.0.2

Feb 6, 2025

0.0.1

Feb 6, 2025

0.0.0

Jan 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detonation-0.2.2.tar.gz (11.4 kB view details)

Uploaded Mar 18, 2025 Source

File details

Details for the file detonation-0.2.2.tar.gz.

File metadata

Download URL: detonation-0.2.2.tar.gz
Upload date: Mar 18, 2025
Size: 11.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for detonation-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`79d893412f1e38a3c612fbae5088f54cd5e9160e33e8183b7f21dcc343933180`
MD5	`57736dce3dd5012d68b869aa216c8a97`
BLAKE2b-256	`7e6b75936f7c3f054e255d40ef6833795600505dc683c2a2a1fedce1b280731f`

See more details on using hashes here.

detonation 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Decoupled Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

Installation

Example

Benchmarks

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes