Distributed ML training across MacBooks. Zero config.
Project description
grove
Distributed ML training across MacBooks. Zero config.
pip install grove-ml
Mac A:
grove start train.py -n 2
Mac B:
grove join
Both machines discover each other automatically, sync gradients, and train together. No SSH, no IP addresses, no configuration files.
Grove discovers peers over AWDL (the protocol behind AirDrop), then upgrades to direct WiFi when both devices share a network. If WiFi isn't available (e.g. eduroam, or no network at all), everything stays on AWDL.
Quick start
Write a training script with a main() function:
# train.py
import grove
import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
def main():
world = grove.init()
model = nn.Linear(64, 64)
optimizer = optim.SGD(learning_rate=0.01)
for step in range(100):
x = mx.random.normal((8, 64))
y = mx.random.normal((8, 64))
loss, grads = nn.value_and_grad(model, lambda m, x, y: mx.mean((m(x) - y) ** 2))(model, x, y)
grads = grove.average_gradients(grads)
optimizer.update(model, grads)
mx.eval(model.state, optimizer.state)
Single device:
grove run train.py
Multiple devices:
grove start train.py -n 2 # coordinator
grove join # worker (shows interactive picker)
Workers receive the training script from the coordinator automatically.
Algorithms
DiLoCo
Each device trains independently for H steps, then syncs pseudo-gradients with Nesterov momentum. Good default for most setups.
diloco = grove.diloco(model, H=500, outer_lr=0.7)
for step in range(total_steps):
loss, grads = loss_and_grad(model, batch)
optimizer.update(model, grads)
mx.eval(model.state, optimizer.state)
diloco.step(model)
| Parameter | Default | Description |
|---|---|---|
H |
500 | Inner steps between syncs |
outer_lr |
0.7 | Outer optimizer learning rate |
outer_momentum |
0.9 | Nesterov momentum |
overlap |
False | Async overlap (sync in background) |
quantize |
False | E3M0 4-bit pseudo-gradients |
SparseLoCo
DiLoCo with top-k compression and error feedback. Sends only the largest 1-3% of values each round, with unsent values carrying forward. ~32x less communication than dense DiLoCo.
sloco = grove.sparseloco(model, H=500, topk=64, chunk=4096)
for step in range(total_steps):
loss, grads = loss_and_grad(model, batch)
optimizer.update(model, grads)
mx.eval(model.state, optimizer.state)
sloco.step(model)
| Parameter | Default | Description |
|---|---|---|
H |
30 | Inner steps between syncs |
outer_lr |
1.0 | Outer optimizer learning rate |
topk |
64 | Values kept per chunk |
chunk |
4096 | Chunk size for top-k selection |
error_decay |
0.95 | Decay on error buffer |
overlap |
True | Async overlap (on by default) |
DeMo
DCT-compressed per-step sync. Transforms gradients to frequency space and sends the most significant components. Syncs every step rather than every H steps. Better suited for fast local networks.
demo = grove.demo(model, lr=1e-3, topk=32)
for step in range(total_steps):
loss, grads = loss_and_grad(model, batch)
demo.step(model, grads)
| Parameter | Default | Description |
|---|---|---|
lr |
1e-3 | Learning rate |
decay |
0.999 | EMA decay |
topk |
32 | DCT components kept per chunk |
chunk |
64 | Chunk size |
API
Initialization
world = grove.init()
world.rank() # this device's rank (0 = coordinator)
world.size() # total number of devices
Collective operations
grove.average_gradients(grads) # all-reduce + average
grove.all_sum(x) # sum an MLX array across devices
grove.all_gather(x) # gather an MLX array from all devices
grove.send(x, dst) # send to a specific rank
grove.recv(shape, dtype, src) # receive from a specific rank
grove.barrier() # wait for all devices
grove.report(loss) # report loss to dashboard
Status
grove.rank # int
grove.world_size # int
grove.is_available() # True if world_size > 1
CLI
grove run <script> Run on a single device
grove start <script> -n N Start a cluster with N nodes
grove start <script> --name X Start with a specific cluster name
grove join [name] Join a cluster (interactive picker if no name)
grove status System info and nearby clusters
Add --logs to any command to see raw log output instead of the dashboard.
Environment variables
| Variable | Effect |
|---|---|
GROVE_NO_WIFI |
Skip WiFi upgrade probe, use AWDL only |
Requirements
- macOS with Apple Silicon (M1+)
- Python 3.10+
- MLX
- Xcode command-line tools (for compiling the Swift helper on first run)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grove_ml-0.1.0.tar.gz.
File metadata
- Download URL: grove_ml-0.1.0.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d58281d19ddb43c2463c347d37c486c9b90acbd414d58fb1fbd1bca73567ccc
|
|
| MD5 |
b93d517569d948e41b04aae80dfe356d
|
|
| BLAKE2b-256 |
fbc35c927ac7c197f02519cc1f68de072377e033fda58fc61a37f0485902f6b7
|
File details
Details for the file grove_ml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: grove_ml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33e42eb22cd230cc585ad2d6c269898fdbb8600a22915eae81bda580f4821990
|
|
| MD5 |
3a35dc9b1b0e0afb16dae113f45e8da7
|
|
| BLAKE2b-256 |
b7d563fe4b02c5180e672d5947f7824b478386b3cdc3f1f17856accb83e850de
|