Skip to main content

PowerSGD: Gradient Compression Algorithm for Distributed Computation

Project description

PowerSGD

Practical Low-Rank Gradient Compression for Distributed Optimization

Video

Abstract: We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets.

Reference implementation

This is a reference implementation for the PowerSGD algorithm.

Installation:

pip install git+https://github.com/epfml/powersgd.git

Usage:

+ from powersgd import PowerSGD, Config, optimizer_step

  model = torchvision.models.resnet50(pretrained=True)
  params = list(model.parameters())
  optimizer = torch.optim.SGD(params, lr=0.1)

+ powersgd = PowerSGD(params, config=Config(
+     rank=1,  # lower rank => more aggressive compression
+     min_compression_rate=10,  # don't compress gradients with less compression
+     num_iters_per_step=2,  #   # lower number => more aggressive compression
+     start_compressing_after_num_steps=0,
+ ))

  for each batch:
      loss = ...
-     optimizer.zero_grad()
      loss.backward()
-     optimizer.step()
+     optimizer_step(optimizer, powersgd)

PyTorch implementation

PyTorch features an implementation of PowerSGD as a communucation hook for DistributedDataParallel models. Because of the integration with DDP, the code is more involved than the code in this repository.

Research code

Research code for the experiments in the PowerSGD paper is located under paper-code.

Selected follow-up work

  • (Ramesh et al., 2021 - DALL-E) share valuable recommendations in using PowerSGD for large-scale transformer training.
  • (Please submit a PR if you want to be included in this list.)

Reference

If you use this code, please cite the following paper

@inproceedings{vkj2019powersgd,
  author = {Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin},
  title = "{{PowerSGD}: Practical Low-Rank Gradient Compression for Distributed Optimization}",
  booktitle = {NeurIPS 2019 - Advances in Neural Information Processing Systems},
  year = 2019,
  url = {https://arxiv.org/abs/1905.13727}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

powersgd-0.0.2-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file powersgd-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: powersgd-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.62.3 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for powersgd-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7e016491949108f3c93c9276241b223721ecb665e93541471f6ed46825e0b74e
MD5 89c9a29b37a24f883971483a42b4f12e
BLAKE2b-256 a7a258507f0c659209a2522645b4b705d291d05a9cdd8cb567a262e7359762a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page