Skip to main content

Lightning strategy extension for Hivemind.

Project description

Lightning ⚡ Hivemind

lightning PyPI Status PyPI - Python Version PyPI Status Deploy Docs

General checks CI testing Build Status pre-commit.ci status

Collaborative Training tries to solve the need for top-tier multi-GPU servers by allowing you to train across unreliable machines, such as local machines or even preemptible cloud compute across the internet.

Under the hood, we use Hivemind which provides de-centralized training across the internet.

To use Collaborative Training, you need to first this extension.

pip install -U lightning-Hivemind

The HivemindStrategy accumulates gradients from all processes that are collaborating until they reach a target_batch_size. By default, we use the batch size of the first batch to determine what each local machine batch contributes towards the target_batch_size. Once the target_batch_size is reached, an optimizer step is made on all processes.

When using HivemindStrategy note that you cannot use gradient accumulation (accumulate_grad_batches). This is because Hivemind manages accumulation internally.

from lightning import Trainer
from lightning_hivemind.strategy import HivemindStrategy

trainer = Trainer(strategy=HivemindStrategy(target_batch_size=8192), accelerator="gpu", devices=1)

Followed by:

python train.py
# Other machines can connect running the same command:
# INITIAL_PEERS=... python train.py
# or passing the peers to the strategy:"
# HivemindStrategy(initial_peers=...)"

A helper message is printed once your training begins, which shows you how to start training on other machines using the same code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning-Hivemind-0.1.0rc1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

lightning_Hivemind-0.1.0rc1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file lightning-Hivemind-0.1.0rc1.tar.gz.

File metadata

  • Download URL: lightning-Hivemind-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for lightning-Hivemind-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 4e08856ab751222619df0cacd7b6f610061425993a270faf195c9c9a2064fafe
MD5 1a6554b3cd651d297a59b792bfacf0ee
BLAKE2b-256 0fb21820805c2d028c69ddef6fe2eb1db81f8344f857210f112eb3ee4ceeee4a

See more details on using hashes here.

File details

Details for the file lightning_Hivemind-0.1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for lightning_Hivemind-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 2577ddddb3277aa6b03519a45a97dc2be3197eb72eafa8b701068261d707786a
MD5 2da9c0bcd98b5a2cbd4cc0b074cef9db
BLAKE2b-256 202e26c062cc4d60e07883334812d2ce77f9300bcca57c1d18bd833391a9762f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page