Skip to main content

Distributed training for pytorch

Project description

distbelief

Implementing Google's DistBelief paper.

Installation/Development instructions

You'll want to create a python3 virtualenv first by running make setup, after which, you should run make install.

You'll then be able to use distbelief by importing distbelief

from distbelief.optim import DownpourSGD

optimizer = DownpourSGD(net.parameters(), lr=0.1, n_push=5, n_pull=5, model=net)

As an example, you can see our implementation running by using the script provided in example/main.py.

To run a 2-training node setup locally, open up three terminal windows, source the venv and then run make first, make second, and make server. This will begin training AlexNet on CIFAR10 locally with all default params.

Benchmarking

NOTE: we graph the train/test accuracy of each node, hence node1, node2, node3. A better comparison would be to evaluate the parameter server's params and use that value. However we can see that the accuracy between the three nodes is fairly consistent, and adding an evaluator might put too much stress on our server.

We scale the learning rate of the nodes to be learning_rate/freq (.03) .

train

test

We used AWS c4.xlarge instances to compare the CPU runs, and a GTX 1060 for the GPU run.

DownpourSGD for PyTorch

Diagram

Here 2 and 3 happen concurrently.

You can read more about our implementation here.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-distbelief-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

pytorch_distbelief-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file pytorch-distbelief-0.1.0.tar.gz.

File metadata

  • Download URL: pytorch-distbelief-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.1

File hashes

Hashes for pytorch-distbelief-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4aeb894824d758181b32539d09ca19af698af7e8ce51e4421ac7fcc970f4f0d9
MD5 0684c0733c179a2d5c2e2eb689dc01be
BLAKE2b-256 e2ffdabfd30c3cc70c3c6fd51b19fd095aad5eaa13fe55ca08f370f65842443b

See more details on using hashes here.

File details

Details for the file pytorch_distbelief-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytorch_distbelief-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.1

File hashes

Hashes for pytorch_distbelief-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e478e7ddbe68d014bc4baaca06bcccceca0c4d592f447ada0ca0b52c00834702
MD5 edc999b550dd420d807b2cd556eecf78
BLAKE2b-256 0e0c110aa501aa32573bc2f9a485da7c6ca7eba2b4cf1871b2d70e897723d2ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page