Skip to main content

PyTorch Lightning Strategy for Horovod.

Project description

Lightning extension: Horovod

CI testing General checks Documentation Status pre-commit.ci status

Horovod allows the same training script to be used for single-GPU, multi-GPU, and multi-node training.

Like Distributed Data Parallel, every process in Horovod operates on a single GPU with a fixed subset of the data. Gradients are averaged across all GPUs in parallel during the backward pass, then synchronously applied before beginning the next step.

The number of worker processes is configured by a driver application (horovodrun or mpirun). In the training script, Horovod will detect the number of workers from the environment, and automatically scale the learning rate to compensate for the increased total batch size.

Horovod can be configured in the training script to run with any number of GPUs / processes as follows:

# train Horovod on GPU (number of GPUs / machines provided on command-line)
trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)

# train Horovod on CPU (number of processes / machines provided on command-line)
trainer = Trainer(strategy="horovod")

When starting the training job, the driver application will then be used to specify the total number of worker processes:

# run training with 4 GPUs on a single machine
horovodrun -np 4 python train.py

# run training with 8 GPUs on two machines (4 GPUs each)
horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py

See the official Horovod documentation for details on installation and performance tuning.

Tests / Docs notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning-Horovod-0.1.0.dev0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

lightning_Horovod-0.1.0.dev0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file lightning-Horovod-0.1.0.dev0.tar.gz.

File metadata

File hashes

Hashes for lightning-Horovod-0.1.0.dev0.tar.gz
Algorithm Hash digest
SHA256 fd00a4b80066658061b96e8b5b01bc44d75cdc48c48ed63022003ccd66f8ab78
MD5 d02daa6e551b17698244943001883ee7
BLAKE2b-256 0e32ea4fdfb432f5d53f1093bc84271d5afa55f1b904eaf77a8b7f615190810d

See more details on using hashes here.

File details

Details for the file lightning_Horovod-0.1.0.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for lightning_Horovod-0.1.0.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5c53dd7e6e08377ba68c32778b903090f2eed9cc5cb2a799d6892bafaa86f1a
MD5 fc2801d542d311a99dfa8856e0011b7c
BLAKE2b-256 34fdc595efb2d7b627538d62100af9922eb5b2743a247a8ec6ff8659f65e6dde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page