Skip to main content

Decouple Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

Reason this release was yanked:

Discovered bug

Project description

Decoupled Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

This code currently implements the results described in FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training. An implementation to run all experiments from the paper is found in the benchmarks folder.

Installation

Installation from PyPI:

pip install detonation

Installation from source:

git clone https://github.com/schneiderkamplab/DeToNATION
cd DeToNATION
pip install .

Example

There is a a full example for language model training using FlexDeMo in the example folder. Please refer to the documentation:

examples/t5/README.md

This example demonstrates the use of the prepare_detonation function for obtaining a distributed model and optimizer.

Benchmarks

There is a a full benchmarking example for language model training using FlexDeMo in the benchmarks folder. Please refer to the documentation:

benchmarks/t5/README.md

This benchmarking example demonstrates the use of the prepare_detonation function for obtaining a distributed model and optimizer, and uses aim and mltiming to track model parameters and performance.

Usage

The direct usage of DeToNATION without using prepare_detonation requires three elements as exemplified below for the FlexDeMo optimizer, i.e., DeToNATION with node-based hybrid sharding using DeMo replication.

First, you need to wrap your model with FSDP and the hybrid sharding strategy:

from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
model = FSDP(
    model,
    sharding_strategy=ShardingStrategy.HYBRID_SHARD,
)

Then, you can import and instantiate the FlexDeMo optimizer:

from detonation import DeMo
optim = DeMo(
    compression_topk=16,
    compression_chunk=128,
    sharding_parallel_group=model.process_group,
    replication_parallel_group=model._inter_node_pg,
)

Third and last, you need to wrap the forward and backward pass using a no_sync context manager to avoid automatic full gradient synchronization:

    with model.no_sync(): # Disable gradient synchronizations across FSDP instances.
        loss = model(input_ids=batch["input_ids"],labels=batch["labels"])["loss"]
        loss.backward()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detonation-0.5.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detonation-0.5.0-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file detonation-0.5.0.tar.gz.

File metadata

  • Download URL: detonation-0.5.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for detonation-0.5.0.tar.gz
Algorithm Hash digest
SHA256 084b5c7a281aec7133c7f9e335e7af6c4d701094f051b5b1c7fccdc62388a9ff
MD5 cf29aa4de3440eecc16e6224292af95a
BLAKE2b-256 77f8a87fa33c8088a992ee40274e75203add3f1ad688ae6cbbbae0c226dd7c4e

See more details on using hashes here.

File details

Details for the file detonation-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: detonation-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for detonation-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9363cebcd2bdaba280bc60311fdbded96887908009cc14471456d0a29606d2e
MD5 2ebcf6d0ec07818e8e5744093e7d1f99
BLAKE2b-256 0c21fbaaa11f00824d0b5f273eb3a00fe6277faeeef8d7508913230f79d0b320

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page