Distributed training for pytorch
Project description
distbelief
Implementing Google's DistBelief paper.
Installation/Development instructions
You'll want to create a python3 virtualenv first by running make setup
, after which, you should run make install
.
You'll then be able to use distbelief by importing distbelief
from distbelief.optim import DownpourSGD
optimizer = DownpourSGD(net.parameters(), lr=0.1, n_push=5, n_pull=5, model=net)
As an example, you can see our implementation running by using the script provided in example/main.py
.
To run a 2-training node setup locally, open up three terminal windows, source the venv
and then run make first
, make second
, and make server
.
This will begin training AlexNet on CIFAR10 locally with all default params.
Benchmarking
NOTE: we graph the train/test accuracy of each node, hence node1, node2, node3. A better comparison would be to evaluate the parameter server's params and use that value. However we can see that the accuracy between the three nodes is fairly consistent, and adding an evaluator might put too much stress on our server.
We scale the learning rate of the nodes to be learning_rate/freq (.03) .
We used AWS c4.xlarge instances to compare the CPU runs, and a GTX 1060 for the GPU run.
DownpourSGD for PyTorch
Diagram
Here 2 and 3 happen concurrently.
You can read more about our implementation here.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pytorch-distbelief-0.1.0.tar.gz
.
File metadata
- Download URL: pytorch-distbelief-0.1.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4aeb894824d758181b32539d09ca19af698af7e8ce51e4421ac7fcc970f4f0d9 |
|
MD5 | 0684c0733c179a2d5c2e2eb689dc01be |
|
BLAKE2b-256 | e2ffdabfd30c3cc70c3c6fd51b19fd095aad5eaa13fe55ca08f370f65842443b |
File details
Details for the file pytorch_distbelief-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pytorch_distbelief-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e478e7ddbe68d014bc4baaca06bcccceca0c4d592f447ada0ca0b52c00834702 |
|
MD5 | edc999b550dd420d807b2cd556eecf78 |
|
BLAKE2b-256 | 0e0c110aa501aa32573bc2f9a485da7c6ca7eba2b4cf1871b2d70e897723d2ff |