implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

DoG Optimizer

This repository contains the implementation of the algorithms in the paper DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule by Maor Ivgi, Oliver Hinder and Yair Carmon.

IMPORTANT: For best performance (and for fair comparison to other methods) DoG/L-DoG must be combined with iterate averaging! This package includes an easy-to-use averager class - its default configuration should work well out of the box.

Algorithm

DoG ("Distance over Gradients") is a parameter-free stochastic optimizer. DoG updates parameters $x_t$ with stochastic gradients $g_t$ according to:

\begin{aligned}
   \eta_t & = \frac{ \bar{r}_t }{ \sqrt{\sum_{i \le t }{\lVert g_i\rVert ^2 + \epsilon}} } \\   
   x_{t+1} & = x_{t} - \eta_t \cdot g_t
  \end{aligned}

where

\begin{equation*}
\bar{r}_t = \begin{cases}
\text{max}_{i \le t}{\lVert x_i - x_0 \rVert} & t \ge 1 \\
r_{\epsilon} & t=0.
\end{cases}
\end{equation*}

The initial movement parameter $r_{\epsilon}$ should be chosen small relative to the distance between $x_0$ and the nearest optimum $x^\star$ (see additional discussion below).

LDoG (layerwise DoG) is a variant of DoG that applies the above update rule separately to every element in the list of parameters provided to the optimizer object.

Installation

To install the package, simply run pip install dog-optimizer.

Usage

DoG and LDoG are implemented using the standard pytorch optimizer interface. After installing the pacakge with pip install dog-optimizer, All you need to do is replace the line that creates your optimizer with

from dog import DoG
optimizer = DoG(optimizer args)

for DoG, or

from dog import LDoG
optimizer = LDoG(optimizer args)

for LDoG, where optimizer args follows the standard pytorch optimizer syntex. To see the list of all available parameters, run help(DoG) or help(LDoG).

Iterate averaging

We provide an implementation of the polynomial decay averaging used throughout our experimentes. TO use it simply create a PolynomialDecayAverager with

from dog import PolynomialDecayAverager
averager = PolynomialDecayAverager(model)

then, after each optimizer.step(), call averager.step() as well. You can then get both the current model and the averaged model with averager.base_model and averager.averaged_model respectively.

Example script

An example of how to use the above to train a simple CNN on MNIST can be found in examples/mnist.py (based on this pytorch example).

Choosing `reps_rel`

DoG is parameter-free by design, so there is no need to tune a learning rate parameter. However, as discussed in the paper, DoG has an initial step movement parameter $r_{\epsilon}$ that must be small enough to avoid destructively updates that cause divergence, but an extremely small value of $r_{\epsilon}$ would slow down training. We recommend choosing $r_{\epsilon}$ relative to the norm of the initial weights $x_0$. In particular, we set $r_{\epsilon}$ to be reps_rel $\times (1+\rVert x_0 \lVert)$, where reps_rel is a configurable parameter of the optimizer. The default value of reps_rel is 1e-6, and we have found it to work well most of the time. However, in our experiments we did encounter some situations that required different values of reps_rel:

If optimization diverges early, it is likely that reps_rel (and hence $r_{\epsilon}$) is too large: try decreasing it by factors 100 until divergence no longer occurs. This happened when applying LDoG to fine-tune T5, which had large pre-trained weights; setting reps_rel to 1e-8 eliminated the divergence.
If the DoG step size (eta) does not substantially increase from its initial value for a few hundred steps, it could be that reps_rel is too small: try increasing it by factors of 100 until you see eta starting to increase in the first few steps. This happened when training models with batch normalization; setting reps_rel to 1e-4 eliminated the problem.

Citation

@article{ivgi2023dog,
  title={{D}o{G} is {SGD}'s Best Friend: A Parameter-Free Dynamic Step Size Schedule}, 
  author={Maor Ivgi and Oliver Hinder and Yair Carmon}, 
  journal={arXiv:2302.12022}, 
  year={2023},
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.3

Jun 14, 2023

1.0.2

Feb 26, 2023

1.0.1

Feb 26, 2023

1.0

Feb 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dog-optimizer-1.0.3.tar.gz (9.4 kB view details)

Uploaded Jun 14, 2023 Source

File details

Details for the file dog-optimizer-1.0.3.tar.gz.

File metadata

Download URL: dog-optimizer-1.0.3.tar.gz
Upload date: Jun 14, 2023
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for dog-optimizer-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`44a670e010084c26d7035dce1b4f9a0a087b52527906756192c1042102e05bd2`
MD5	`376b41900010a92a7f39f7893baa957f`
BLAKE2b-256	`ed9cba7aaca66425d16c481cc9e879ba5148f7334d38ed893665ee9b78304b96`

See more details on using hashes here.

dog-optimizer 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DoG Optimizer

Algorithm

Installation

Usage

Iterate averaging

Example script

Choosing `reps_rel`

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

dog-optimizer 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DoG Optimizer

Algorithm

Installation

Usage

Iterate averaging

Example script

Choosing reps_rel

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Choosing `reps_rel`