pytorch-optimizer
Project description
pytorch-optimizer
Bunch of optimizer implementations in PyTorch with clean-code, strict types. Highly inspired by pytorch-optimizer.
Usage
Install
$ pip3 install pytorch-optimizer
Supported Optimizers
Optimizer | Description | Official Code | Paper |
---|---|---|---|
AdamP | Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights | github | https://arxiv.org/abs/2006.08217 |
MADGRAD | A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic | github | https://arxiv.org/abs/2101.11075 |
RAdam | On the Variance of the Adaptive Learning Rate and Beyond | github | https://arxiv.org/abs/1908.03265 |
Ranger | a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer | github | |
Ranger21 | a synergistic deep learning optimizer | github | https://arxiv.org/abs/2106.13731 |
Useful Resources
Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21
optimizer.
Also, most of the captures are taken from Ranger21
paper.
Adaptive Gradient Clipping (AGC)
This idea originally proposed in NFNet (Normalized-Free Network)
paper.
AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms
.
Gradient Centralization (GC)
Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.
Softplus Transformation
By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.
- paper : arXiv
Gradient Normalization
Norm Loss
- paper : arXiv
Positive-Negative Momentum
Linear learning-rate warm-up
- paper : arXiv
Stable weight decay
Explore-exploit learning-rate schedule
Lookahead
k
steps forward, 1 step back. Lookahead
consisting of keeping an exponential moving average of the weights that is
updated and substituted to the current weights every k_{lookahead}
steps (5 by default).
Chebyshev learning rate schedule
Acceleration via Fractal Learning Rate Schedules
- paper : arXiv
Citations
AdamP
@inproceedings{heo2021adamp,
title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
year={2021},
booktitle={International Conference on Learning Representations (ICLR)},
}
Adaptive Gradient Clipping (AGC)
@article{brock2021high,
author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
title={High-Performance Large-Scale Image Recognition Without Normalization},
journal={arXiv preprint arXiv:2102.06171},
year={2021}
}
Chebyshev LR Schedules
@article{agarwal2021acceleration,
title={Acceleration via Fractal Learning Rate Schedules},
author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
journal={arXiv preprint arXiv:2103.01338},
year={2021}
}
Gradient Centralization (GC)
@inproceedings{yong2020gradient,
title={Gradient centralization: A new optimization technique for deep neural networks},
author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
booktitle={European Conference on Computer Vision},
pages={635--652},
year={2020},
organization={Springer}
}
Lookahead
@article{zhang2019lookahead,
title={Lookahead optimizer: k steps forward, 1 step back},
author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
journal={arXiv preprint arXiv:1907.08610},
year={2019}
}
RAdam
@inproceedings{liu2019radam,
author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
month = {April},
title = {On the Variance of the Adaptive Learning Rate and Beyond},
year = {2020}
}
Norm Loss
@inproceedings{georgiou2021norm,
title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
pages={8812--8818},
year={2021},
organization={IEEE}
}
Positive-Negative Momentum
@article{xie2021positive,
title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
journal={arXiv preprint arXiv:2103.17182},
year={2021}
}
Explore-Exploit learning rate schedule
@article{iyer2020wide,
title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
journal={arXiv preprint arXiv:2003.03977},
year={2020}
}
Linear learning-rate warm-up
@article{ma2019adequacy,
title={On the adequacy of untuned warmup for adaptive optimization},
author={Ma, Jerry and Yarats, Denis},
journal={arXiv preprint arXiv:1910.04209},
volume={7},
year={2019}
}
Stable weight decay
@article{xie2020stable,
title={Stable weight decay regularization},
author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
journal={arXiv preprint arXiv:2011.11152},
year={2020}
}
Softplus transformation
@article{tong2019calibrating,
title={Calibrating the adaptive learning rate to improve convergence of adam},
author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
journal={arXiv preprint arXiv:1908.00700},
year={2019}
}
MADGRAD
@article{defazio2021adaptivity,
title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
author={Defazio, Aaron and Jelassi, Samy},
journal={arXiv preprint arXiv:2101.11075},
year={2021}
}
Author
Hyeongchan Kim / @kozistr
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pytorch-optimizer-0.0.2.tar.gz
.
File metadata
- Download URL: pytorch-optimizer-0.0.2.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f82780be4b8e5535c0bb9b8196da9655ff563eb9ede192e37926c6c47b17e1bb |
|
MD5 | 62c0a02dc79c04323bb200d175260009 |
|
BLAKE2b-256 | 785f934300564dde5efbf128954f93d9f82c41967612fca44be01753a8f050d7 |
File details
Details for the file pytorch_optimizer-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: pytorch_optimizer-0.0.2-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b61dff882d0741a108406fa7a1cb907f459a8562e22c2820b6a751e1ef92ca96 |
|
MD5 | bddc738f53d3d4223b3f757f4c92a204 |
|
BLAKE2b-256 | 5fb12839ccd06c7c65668c921334acfb0bfa46354d467065c6dae9c9720c983b |