Skip to main content

implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

Project description

DoG Optimizer

This repository contains the implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule by Maor Ivgi, Oliver Hinder and Yair Carmon.

Installation

To install the package, simply run pip install dog-optimizer.

Usage

DoG (or LDoG) are implemented using the standard pytorch optimizer interface. After installing the pacakge with pip install dog-optimizer, All you need to do is replace the line that creates your optimizer with

from dog_optimizer import DoG
optimizer = DoG(optimizer args)

for DoG, or

from dog_optimizer import LDoG
optimizer = LDoG(optimizer args)

for LDoG, where optimizer args follows the standard pytorch optimizer syntex. To see the list of all available parameters, run help(DoG) or help(LDoG).

Using the polynomial decay averager is also easy. Simply create it with

from dog_optimizer import PolynomialDecayAverager
averager = PolynomialDecayAverager(model)

then, after each optimizer.step(), call averager.step() as well. You can then get both the current model and the averaged model with averager.base_model and averager.averaged_model respectively.

An example of how to use the above to train a simple CNN on MNIST can be found in examples/mnist.py (based on this pytorch example).

Choosing reps_rel

DoG is parameter-free by design, so there is no need to tune a learning rate parameter. However, as discussed in the paper, DoG has an initial step movement parameter $r_{\epsilon}$ that must be small enough to avoid destructively updates that cause divergence, but an extremely small value of $r_{\epsilon}$ would slow down training. We recommend choosing $r_{\epsilon}$ relative to the norm of the initial weights $x_0$. In particular, we set $r_{\epsilon}$ to be reps_rel time $1+|x_0|$, where reps_rel is a configurable parameter of the optimizer. The default value of reps_rel is 1e-6, and we have found it to work well most of the time. However, in our experiments we did encounter some situations that required different values of reps_rel:

  • If optimization diverges early, it is likely that reps_rel (and hence $r_{\epsilon}$) is too large: try decreasing it by factors 100 until divergence no longer occurs. This happened when applying LDoG to fine-tune T5, which had large pre-trained weights; setting reps_rel to 1e-8 eliminated the divergence.
  • If the DoG step size (eta) does not substantially increase from its initial value for a few hundred steps, it could be that reps_rel is too small: try increasing it by factors of 100 until you see eta starting to increase in the first few steps. This happened when training models with batch normalization; setting reps_rel to 1e-4 eliminated the problem.

Citation

@article{ivgi2023dog,
  title={{D}o{G} is {SGD}'s Best Friend: A Parameter-Free Dynamic Step Size Schedule}, 
  author={Maor Ivgi and Oliver Hinder and Yair Carmon}, 
  journal={arXiv:2302.12022}, 
  year={2023},
}  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dog_optimizer-1.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

dog_optimizer-1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file dog_optimizer-1.0.tar.gz.

File metadata

  • Download URL: dog_optimizer-1.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for dog_optimizer-1.0.tar.gz
Algorithm Hash digest
SHA256 88bdcad740129e3b7060205db9c29e1615fe1061aa996a6f8ea5b308d12dc5fe
MD5 52fb545306c28a6faeeb5272102ec4e1
BLAKE2b-256 dcd69d5e68e57c0b7bb2946f864d4b55df4fcc9a29f2cdebd272670f19e583b3

See more details on using hashes here.

File details

Details for the file dog_optimizer-1.0-py3-none-any.whl.

File metadata

  • Download URL: dog_optimizer-1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for dog_optimizer-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2de62aba639385822373f720adb492b378b4e7d4bd2c7873102940807330454
MD5 5d38563a636cab6d135a915bcce78fb0
BLAKE2b-256 6c4b805de2a7e90c8c2f9ae5aef83e9960f325537ca66c6218a962ec8987b820

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page