Skip to main content

Decouple Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

Project description

Decoupled Torch Network-Aware Training on Interlinked Online Nodes (DeToNATION)

Installation

Installation from PyPI:

pip install detonation

Installation from source:

git clone https://github.com/schneiderkamplab/DeToNATION
cd DeToNATION
pip install .

Usage

The usage requires three elements as exemplified below for using the FlexDeMo optimizer.

First, you need to wrap your model with FSDP and the hybrid sharding strategy:

from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
model = FSDP(
    model,
    sharding_strategy=ShardingStrategy.HYBRID_SHARD,
)

Then, you can import and instantiate the FlexDeMo optimizer:

from detonation import DeMo
optim = DeMo(
    compression_topk=16,
    compression_chunk=128,
    sharding_parallel_group=model.process_group,
    replication_parallel_group=model._inter_node_pg,
)

Third and last, you need to wrap the forward and backward pass using a no_sync context manager to avoid automatic full gradient synchronization:

    with model.no_sync(): # Disable gradient synchronizations across FSDP instances.
        loss = model(input_ids=batch["input_ids"],labels=batch["labels"])["loss"]
        loss.backward()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detonation-0.1.1.tar.gz (6.7 kB view details)

Uploaded Source

File details

Details for the file detonation-0.1.1.tar.gz.

File metadata

  • Download URL: detonation-0.1.1.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for detonation-0.1.1.tar.gz
Algorithm Hash digest
SHA256 61b494e08d013db699403ab86ce71b99375d9d7ba6bfda21674038a3b54f971e
MD5 d8992f5d762a1a47b94fed85c39337ee
BLAKE2b-256 15e083f9a2f4405b3a377be5daea3995633986dac00ad01b46a1cc805d2c50b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page