implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Project description
DoG Optimizer
This repository contains the implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule by Maor Ivgi, Oliver Hinder and Yair Carmon.
Installation
To install the package, simply run pip install dog-optimizer
.
Usage
DoG (or LDoG) are implemented using the standard pytorch optimizer interface. After installing the pacakge with pip install dog-optimizer
,
All you need to do is replace the line that creates your optimizer with
from dog import DoG
optimizer = DoG(optimizer args)
for DoG, or
from dog import LDoG
optimizer = LDoG(optimizer args)
for LDoG,
where optimizer args
follows the standard pytorch optimizer syntex.
To see the list of all available parameters, run help(DoG)
or help(LDoG)
.
Using the polynomial decay averager is also easy. Simply create it with
from dog import PolynomialDecayAverager
averager = PolynomialDecayAverager(model)
then, after each optimizer.step()
, call averager.step()
as well.
You can then get both the current model and the averaged model with averager.base_model
and averager.averaged_model
respectively.
An example of how to use the above to train a simple CNN on MNIST can be found in examples/mnist.py
(based on this pytorch example).
Choosing reps_rel
DoG is parameter-free by design, so there is no need to tune a learning rate parameter.
However, as discussed in the paper, DoG has an initial step movement parameter
$r_{\epsilon}$ that must be small enough to avoid destructively updates that cause divergence,
but an extremely small value of $r_{\epsilon}$ would slow down training.
We recommend choosing $r_{\epsilon}$ relative to the norm of the initial weights $x_0$. In particular, we set
$r_{\epsilon}$ to be reps_rel
time $1+|x_0|$, where reps_rel
is a configurable parameter of the optimizer. The default value
of reps_rel
is 1e-6, and we have found it to work well most of the time. However, in our experiments we did encounter
some situations that required different values of reps_rel
:
- If optimization diverges early, it is likely that
reps_rel
(and hence $r_{\epsilon}$) is too large: try decreasing it by factors 100 until divergence no longer occurs. This happened when applying LDoG to fine-tune T5, which had large pre-trained weights; settingreps_rel
to 1e-8 eliminated the divergence. - If the DoG step size (
eta
) does not substantially increase from its initial value for a few hundred steps, it could be thatreps_rel
is too small: try increasing it by factors of 100 until you seeeta
starting to increase in the first few steps. This happened when training models with batch normalization; settingreps_rel
to 1e-4 eliminated the problem.
Citation
@article{ivgi2023dog,
title={{D}o{G} is {SGD}'s Best Friend: A Parameter-Free Dynamic Step Size Schedule},
author={Maor Ivgi and Oliver Hinder and Yair Carmon},
journal={arXiv:2302.12022},
year={2023},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dog-optimizer-1.0.1.tar.gz
.
File metadata
- Download URL: dog-optimizer-1.0.1.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59d8994760cf7b91de0ef63084eecfc2a420e23fffe61acd81c0bf06e48cd35e |
|
MD5 | 3c871d7c082947adbff72055e25d9dde |
|
BLAKE2b-256 | 6fdcfb4e714c283b6c1000d35c68b4b4635f2f9e7ac4d8a6057899b8e5f209ee |
File details
Details for the file dog_optimizer-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: dog_optimizer-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d3680f3a03ba61ed849d53b02d4624d1dfe9e57c0420b8f0690e7f4f246b789 |
|
MD5 | acc7cc183283994e94640bc2d02c7200 |
|
BLAKE2b-256 | 1789e2d92540ed796622e4fc017e082737c57c00e35795816c4f905324da0d07 |