pytorch-optimizer

These details have not been verified by PyPI

Project links

Project description

pytorch-optimizer

workflow

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Highly inspired by pytorch-optimizer.

Usage

Install

$ pip3 install pytorch-optimizer

Simple Usage

from pytorch_optimizer import Ranger21

...
model = YourModel()
optimizer = Ranger21(model.parameters())
...

for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward()
  optimizer.step()

Supported Optimizers

Optimizer	Description	Official Code	Paper
AdaBelief	Adapting Stepsizes by the Belief in Observed Gradients	github	https://arxiv.org/abs/2010.07468
AdaBound	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	github	https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian	An Adaptive Second Order Optimizer for Machine Learning	github	https://arxiv.org/abs/2006.00719
AdamP	Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	github	https://arxiv.org/abs/2006.08217
MADGRAD	A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic	github	https://arxiv.org/abs/2101.11075
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	github	https://arxiv.org/abs/1908.03265
Ranger	a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer	github
Ranger21	a synergistic deep learning optimizer	github	https://arxiv.org/abs/2106.13731

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping (AGC)

This idea originally proposed in NFNet (Normalized-Free Network) paper. AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

github : code
paper : arXiv

Gradient Centralization (GC)

gradient_centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

github : code
paper : arXiv

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization

Norm Loss

norm_loss

paper : arXiv

Positive-Negative Momentum

positive_negative_momentum

github : code
paper : arXiv

Linear learning-rate warm-up

linear_lr_warmup

paper : arXiv

Stable weight decay

stable_weight_decay

github : code
paper : arXiv

Explore-exploit learning-rate schedule

explore_exploit_lr_schedule

github : code
paper : arXiv

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is updated and substituted to the current weights every k_{lookahead} steps (5 by default).

github : code
paper : arXiv

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

paper : arXiv

Citations

AdamP

@inproceedings{heo2021adamp,
    title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
    author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
    year={2021},
    booktitle={International Conference on Learning Representations (ICLR)},
}

Adaptive Gradient Clipping (AGC)

@article{brock2021high,
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  journal={arXiv preprint arXiv:2102.06171},
  year={2021}
}

Chebyshev LR Schedules

@article{agarwal2021acceleration,
  title={Acceleration via Fractal Learning Rate Schedules},
  author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
  journal={arXiv preprint arXiv:2103.01338},
  year={2021}
}

Gradient Centralization (GC)

@inproceedings{yong2020gradient,
  title={Gradient centralization: A new optimization technique for deep neural networks},
  author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
  booktitle={European Conference on Computer Vision},
  pages={635--652},
  year={2020},
  organization={Springer}
}

Lookahead

@article{zhang2019lookahead,
  title={Lookahead optimizer: k steps forward, 1 step back},
  author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
  journal={arXiv preprint arXiv:1907.08610},
  year={2019}
}

RAdam

@inproceedings{liu2019radam,
 author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
 booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
 month = {April},
 title = {On the Variance of the Adaptive Learning Rate and Beyond},
 year = {2020}
}

Norm Loss

@inproceedings{georgiou2021norm,
  title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
  author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={8812--8818},
  year={2021},
  organization={IEEE}
}

Positive-Negative Momentum

@article{xie2021positive,
  title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2103.17182},
  year={2021}
}

Explore-Exploit learning rate schedule

@article{iyer2020wide,
  title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
  author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
  journal={arXiv preprint arXiv:2003.03977},
  year={2020}
}

Linear learning-rate warm-up

@article{ma2019adequacy,
  title={On the adequacy of untuned warmup for adaptive optimization},
  author={Ma, Jerry and Yarats, Denis},
  journal={arXiv preprint arXiv:1910.04209},
  volume={7},
  year={2019}
}

Stable weight decay

@article{xie2020stable,
  title={Stable weight decay regularization},
  author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2011.11152},
  year={2020}
}

Softplus transformation

@article{tong2019calibrating,
  title={Calibrating the adaptive learning rate to improve convergence of adam},
  author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
  journal={arXiv preprint arXiv:1908.00700},
  year={2019}
}

MADGRAD

@article{defazio2021adaptivity,
  title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
  author={Defazio, Aaron and Jelassi, Samy},
  journal={arXiv preprint arXiv:2101.11075},
  year={2021}
}

AdaHessian

@article{yao2020adahessian,
  title={ADAHESSIAN: An adaptive second order optimizer for machine learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}

AdaBound

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

AdaBelief

@article{zhuang2020adabelief,
  title={Adabelief optimizer: Adapting stepsizes by the belief in observed gradients},
  author={Zhuang, Juntang and Tang, Tommy and Ding, Yifan and Tatikonda, Sekhar and Dvornek, Nicha and Papademetris, Xenophon and Duncan, James S},
  journal={arXiv preprint arXiv:2010.07468},
  year={2020}
}

Author

Hyeongchan Kim / @kozistr

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.2.0

Oct 28, 2024

3.1.2

Sep 10, 2024

3.1.1

Aug 14, 2024

3.1.0

Jul 21, 2024

3.0.2

Jul 6, 2024

3.0.1

Jun 22, 2024

3.0.0

May 21, 2024

2.12.0

Oct 7, 2023

2.11.2

Sep 2, 2023

2.11.1

Jul 19, 2023

2.11.0

Jun 27, 2023

2.10.1

Jun 13, 2023

2.10.0

Jun 7, 2023

2.9.1

May 19, 2023

2.9.0

May 6, 2023

2.8.0

Apr 29, 2023

2.7.0

Apr 26, 2023

2.6.1

Apr 22, 2023

2.6.0

Apr 22, 2023

2.5.2

Apr 11, 2023

2.5.1

Mar 12, 2023

2.5.0

Feb 15, 2023

2.4.2

Feb 10, 2023

2.4.1

Feb 6, 2023

2.4.0

Feb 2, 2023

2.3.1

Jan 31, 2023

2.3.0

Jan 30, 2023

2.2.1

Jan 28, 2023

2.2.0

Jan 24, 2023

2.1.1

Jan 2, 2023

2.1.0

Jan 1, 2023

2.0.1

Nov 1, 2022

2.0.0

Oct 21, 2022

1.3.2

Sep 2, 2022

1.3.1

Sep 1, 2022

1.2.0

Aug 26, 2022

1.1.4

Aug 25, 2022

1.1.3

Aug 23, 2022

1.1.2

Jun 1, 2022

1.1.1

May 9, 2022

1.1.0

May 8, 2022

1.0.0

May 7, 2022

0.6.1

May 7, 2022

0.6.0

Apr 2, 2022

0.5.0

Mar 5, 2022

0.4.2

Mar 5, 2022

0.4.1

Feb 20, 2022

0.4.0

Feb 19, 2022

0.3.7

Feb 1, 2022

0.3.6

Jan 31, 2022

0.3.5

Jan 30, 2022

0.3.4

Jan 29, 2022

0.3.3

Jan 29, 2022

0.3.2

Jan 28, 2022

0.3.1

Jan 28, 2022

0.3.0

Jan 28, 2022

0.2.2

Nov 29, 2021

0.2.1

Nov 22, 2021

0.2.0

Nov 15, 2021

0.1.1

Oct 9, 2021

0.1.0

Oct 6, 2021

0.0.11

Oct 6, 2021

0.0.10

Sep 25, 2021

0.0.9

Sep 23, 2021

0.0.8

Sep 23, 2021

0.0.7

Sep 22, 2021

0.0.6

Sep 22, 2021

This version

0.0.5

Sep 22, 2021

0.0.4

Sep 22, 2021

0.0.3

Sep 22, 2021

0.0.2

Sep 21, 2021

0.0.1

Sep 21, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-optimizer-0.0.5.tar.gz (29.1 kB view details)

Uploaded Sep 22, 2021 Source

Built Distribution

pytorch_optimizer-0.0.5-py3-none-any.whl (35.3 kB view details)

Uploaded Sep 22, 2021 Python 3

File details

Details for the file pytorch-optimizer-0.0.5.tar.gz.

File metadata

Download URL: pytorch-optimizer-0.0.5.tar.gz
Upload date: Sep 22, 2021
Size: 29.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for pytorch-optimizer-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`b4346fec42a0380966e0c08732da1f511d7aad9b3b067c411040d4dadd6895f1`
MD5	`81e06d31c82e81a996abd8ee65d1b9dc`
BLAKE2b-256	`91138197552f0374e1ab9abbaf7d64a7bb6c95f2f30774b4f036c903b509e991`

See more details on using hashes here.

File details

Details for the file pytorch_optimizer-0.0.5-py3-none-any.whl.

File metadata

Download URL: pytorch_optimizer-0.0.5-py3-none-any.whl
Upload date: Sep 22, 2021
Size: 35.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for pytorch_optimizer-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aee9c76a5b1b8e2c3ea156d325aed9b83ea291301279e103118d111815d55885`
MD5	`d56b207129002cd980f4f2a4c04924fb`
BLAKE2b-256	`d9f5e9b0ae0b790dce54c16ee5617e43fac6d8e173f4197348e64ec84a16325f`