Lightning strategy extension for Hivemind.
Project description
Lightning ⚡ Hivemind
Collaborative Training tries to solve the need for top-tier multi-GPU servers by allowing you to train across unreliable machines, such as local machines or even preemptible cloud compute across the internet.
Under the hood, we use Hivemind which provides de-centralized training across the internet.
To use Collaborative Training, you need to first this extension.
pip install -U lightning-Hivemind
The HivemindStrategy
accumulates gradients from all processes that are collaborating until they reach a target_batch_size
. By default, we use the batch size
of the first batch to determine what each local machine batch contributes towards the target_batch_size
. Once the target_batch_size
is reached, an optimizer step
is made on all processes.
When using HivemindStrategy
note that you cannot use gradient accumulation (accumulate_grad_batches
). This is because Hivemind manages accumulation internally.
from lightning import Trainer
from lightning_hivemind.strategy import HivemindStrategy
trainer = Trainer(strategy=HivemindStrategy(target_batch_size=8192), accelerator="gpu", devices=1)
Followed by:
python train.py
# Other machines can connect running the same command:
# INITIAL_PEERS=... python train.py
# or passing the peers to the strategy:"
# HivemindStrategy(initial_peers=...)"
A helper message is printed once your training begins, which shows you how to start training on other machines using the same code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lightning-Hivemind-0.1.0rc1.tar.gz
.
File metadata
- Download URL: lightning-Hivemind-0.1.0rc1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e08856ab751222619df0cacd7b6f610061425993a270faf195c9c9a2064fafe |
|
MD5 | 1a6554b3cd651d297a59b792bfacf0ee |
|
BLAKE2b-256 | 0fb21820805c2d028c69ddef6fe2eb1db81f8344f857210f112eb3ee4ceeee4a |
File details
Details for the file lightning_Hivemind-0.1.0rc1-py3-none-any.whl
.
File metadata
- Download URL: lightning_Hivemind-0.1.0rc1-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2577ddddb3277aa6b03519a45a97dc2be3197eb72eafa8b701068261d707786a |
|
MD5 | 2da9c0bcd98b5a2cbd4cc0b074cef9db |
|
BLAKE2b-256 | 202e26c062cc4d60e07883334812d2ce77f9300bcca57c1d18bd833391a9762f |