pytorch_optimizer

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.

These details have not been verified by PyPI

Project links

Project description

Build
Quality
Package
Status

pytorch-optimizer is bunch of optimizer collections in PyTorch. Also, including useful optimization ideas.
Most of the implementations are based on the original paper, but I added some tweaks.
Highly inspired by pytorch-optimizer.

Documentation

https://pytorch-optimizers.readthedocs.io/en/latest/

Usage

Install

$ pip3 install -U pytorch-optimizer

$ pip3 install -U --no-deps pytorch-optimizer

Simple Usage

from pytorch_optimizer import AdamP

model = YourModel()
optimizer = AdamP(model.parameters())

# or you can use optimizer loader, simply passing a name of the optimizer.

from pytorch_optimizer import load_optimizer

model = YourModel()
opt = load_optimizer(optimizer='adamp')
optimizer = opt(model.parameters())

Also, you can load the optimizer via torch.hub

import torch

model = YourModel()
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
optimizer = opt(model.parameters())

And you can check the supported optimizers & lr schedulers.

from pytorch_optimizer import get_supported_optimizers, get_supported_lr_schedulers

supported_optimizers = get_supported_optimizers()
supported_lr_schedulers = get_supported_lr_schedulers()

Supported Optimizers

Optimizer	Description	Official Code	Paper
AdaBelief	Adapting Step-sizes by the Belief in Observed Gradients	github	https://arxiv.org/abs/2010.07468
AdaBound	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	github	https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian	An Adaptive Second Order Optimizer for Machine Learning	github	https://arxiv.org/abs/2006.00719
AdamD	Improved bias-correction in Adam		https://arxiv.org/abs/2110.10828
AdamP	Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	github	https://arxiv.org/abs/2006.08217
diffGrad	An Optimization Method for Convolutional Neural Networks	github	https://arxiv.org/abs/1909.11015v3
MADGRAD	A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic	github	https://arxiv.org/abs/2101.11075
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	github	https://arxiv.org/abs/1908.03265
Ranger	a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer	github	https://bit.ly/3zyspC3
Ranger21	a synergistic deep learning optimizer	github	https://arxiv.org/abs/2106.13731
Lamb	Large Batch Optimization for Deep Learning	github	https://arxiv.org/abs/1904.00962
Shampoo	Preconditioned Stochastic Tensor Optimization	github	https://arxiv.org/abs/1802.09568
Nero	Learning by Turning: Neural Architecture Aware Optimisation	github	https://arxiv.org/abs/2102.07227
Adan	Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models	github	https://arxiv.org/abs/2208.06677
Adai	Disentangling the Effects of Adaptive Learning Rate and Momentum	github	https://arxiv.org/abs/2006.15815

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping	Gradient Centralization	Softplus Transformation
Gradient Normalization	Norm Loss	Positive-Negative Momentum
Linear learning rate warmup	Stable weight decay	Explore-exploit learning rate schedule
Lookahead	Chebyshev learning rate schedule	(Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond	Gradient Surgery for Multi-Task Learning

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.

AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

code : github
paper : arXiv

Gradient Centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

code : github
paper : arXiv

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization

Norm Loss

paper : arXiv

Positive-Negative Momentum

code : github
paper : arXiv

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

paper : arXiv

Stable weight decay

code : github
paper : arXiv

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

code : github
paper : arXiv

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is

updated and substituted to the current weights every k_{lookahead} steps (5 by default).

code : github
paper : arXiv

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

paper : arXiv

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.

In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

SAM paper : paper
ASAM paper : paper
A/SAM code : github

On the Convergence of Adam and Beyond

paper : paper

Gradient Surgery for Multi-Task Learning

paper : paper

Author

Hyeongchan Kim / @kozistr

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.2.0

Oct 28, 2024

3.1.2

Sep 10, 2024

3.1.1

Aug 14, 2024

3.1.0

Jul 21, 2024

3.0.2

Jul 6, 2024

3.0.1

Jun 22, 2024

3.0.0

May 21, 2024

2.12.0

Oct 7, 2023

2.11.2

Sep 2, 2023

2.11.1

Jul 19, 2023

2.11.0

Jun 27, 2023

2.10.1

Jun 13, 2023

2.10.0

Jun 7, 2023

2.9.1

May 19, 2023

2.9.0

May 6, 2023

2.8.0

Apr 29, 2023

2.7.0

Apr 26, 2023

2.6.1

Apr 22, 2023

2.6.0

Apr 22, 2023

2.5.2

Apr 11, 2023

2.5.1

Mar 12, 2023

2.5.0

Feb 15, 2023

2.4.2

Feb 10, 2023

2.4.1

Feb 6, 2023

2.4.0

Feb 2, 2023

2.3.1

Jan 31, 2023

2.3.0

Jan 30, 2023

2.2.1

Jan 28, 2023

2.2.0

Jan 24, 2023

2.1.1

Jan 2, 2023

This version

2.1.0

Jan 1, 2023

2.0.1

Nov 1, 2022

2.0.0

Oct 21, 2022

1.3.2

Sep 2, 2022

1.3.1

Sep 1, 2022

1.2.0

Aug 26, 2022

1.1.4

Aug 25, 2022

1.1.3

Aug 23, 2022

1.1.2

Jun 1, 2022

1.1.1

May 9, 2022

1.1.0

May 8, 2022

1.0.0

May 7, 2022

0.6.1

May 7, 2022

0.6.0

Apr 2, 2022

0.5.0

Mar 5, 2022

0.4.2

Mar 5, 2022

0.4.1

Feb 20, 2022

0.4.0

Feb 19, 2022

0.3.7

Feb 1, 2022

0.3.6

Jan 31, 2022

0.3.5

Jan 30, 2022

0.3.4

Jan 29, 2022

0.3.3

Jan 29, 2022

0.3.2

Jan 28, 2022

0.3.1

Jan 28, 2022

0.3.0

Jan 28, 2022

0.2.2

Nov 29, 2021

0.2.1

Nov 22, 2021

0.2.0

Nov 15, 2021

0.1.1

Oct 9, 2021

0.1.0

Oct 6, 2021

0.0.11

Oct 6, 2021

0.0.10

Sep 25, 2021

0.0.9

Sep 23, 2021

0.0.8

Sep 23, 2021

0.0.7

Sep 22, 2021

0.0.6

Sep 22, 2021

0.0.5

Sep 22, 2021

0.0.4

Sep 22, 2021

0.0.3

Sep 22, 2021

0.0.2

Sep 21, 2021

0.0.1

Sep 21, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_optimizer-2.1.0.tar.gz (39.1 kB view details)

Uploaded Jan 1, 2023 Source

Built Distribution

pytorch_optimizer-2.1.0-py3-none-any.whl (62.0 kB view details)

Uploaded Jan 1, 2023 Python 3

File details

Details for the file pytorch_optimizer-2.1.0.tar.gz.

File metadata

Download URL: pytorch_optimizer-2.1.0.tar.gz
Upload date: Jan 1, 2023
Size: 39.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.11.1 Linux/5.15.0-1024-azure

File hashes

Hashes for pytorch_optimizer-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`72d3a7e6057b6ec30a14a593013e7179ef3d2905d2fe37ee0362ab8fbc312a5a`
MD5	`27d14384632e951eea1854041ce4c941`
BLAKE2b-256	`b1d7478667ef095544b414e4ee85a7c36318243b9a2a5b8418dac0fd9f9235f1`

See more details on using hashes here.

File details

Details for the file pytorch_optimizer-2.1.0-py3-none-any.whl.

File metadata

Download URL: pytorch_optimizer-2.1.0-py3-none-any.whl
Upload date: Jan 1, 2023
Size: 62.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.11.1 Linux/5.15.0-1024-azure

File hashes

Hashes for pytorch_optimizer-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff43262f70c8e449f7bff303f7ea6d88fe0bc3e68f298bafb5fb8c15ae453405`
MD5	`7978ba5ab7e303d7ec16e715ea92b9d7`
BLAKE2b-256	`5028bb81842bd975efadfd60ad519da9c86e70dcf31c2719509f3bd725cd2157`