Skip to main content

optimizer & lr scheduler implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.

Project description

Build

workflow Documentation Status

Quality

codecov black ruff

Package

PyPI version PyPI pyversions

Status

PyPi download PyPi month download

License

apache

pytorch-optimizer is bunch of optimizer collections in PyTorch. Also, including useful optimization ideas.
Most of the implementations are based on the original paper, but I added some tweaks.
Highly inspired by pytorch-optimizer.

Getting Started

For more, see the documentation.

Installation

$ pip3 install -U pytorch-optimizer

If there’s a version issue when installing the package, try with –no-deps option.

$ pip3 install -U --no-deps pytorch-optimizer

Simple Usage

from pytorch_optimizer import AdamP

model = YourModel()
optimizer = AdamP(model.parameters())

# or you can use optimizer loader, simply passing a name of the optimizer.

from pytorch_optimizer import load_optimizer

model = YourModel()
opt = load_optimizer(optimizer='adamp')
optimizer = opt(model.parameters())

Also, you can load the optimizer via torch.hub

import torch

model = YourModel()
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
optimizer = opt(model.parameters())

If you want to build the optimizer with parameters & configs, there’s create_optimizer() API.

from pytorch_optimizer import create_optimizer

optimizer = create_optimizer(
    model,
    'adamp',
    lr=1e-3,
    weight_decay=1e-3,
    use_gc=True,
    use_lookahead=True,
)

Supported Optimizers

You can check the supported optimizers & lr schedulers.

from pytorch_optimizer import get_supported_optimizers, get_supported_lr_schedulers

supported_optimizers = get_supported_optimizers()
supported_lr_schedulers = get_supported_lr_schedulers()

Optimizer

Description

Official Code

Paper

AdaBelief

Adapting Step-sizes by the Belief in Observed Gradients

github

https://arxiv.org/abs/2010.07468

AdaBound

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

github

https://openreview.net/forum?id=Bkg3g2R9FX

AdaHessian

An Adaptive Second Order Optimizer for Machine Learning

github

https://arxiv.org/abs/2006.00719

AdamD

Improved bias-correction in Adam

https://arxiv.org/abs/2110.10828

AdamP

Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

github

https://arxiv.org/abs/2006.08217

diffGrad

An Optimization Method for Convolutional Neural Networks

github

https://arxiv.org/abs/1909.11015v3

MADGRAD

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic

github

https://arxiv.org/abs/2101.11075

RAdam

On the Variance of the Adaptive Learning Rate and Beyond

github

https://arxiv.org/abs/1908.03265

Ranger

a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer

github

https://bit.ly/3zyspC3

Ranger21

a synergistic deep learning optimizer

github

https://arxiv.org/abs/2106.13731

Lamb

Large Batch Optimization for Deep Learning

github

https://arxiv.org/abs/1904.00962

Shampoo

Preconditioned Stochastic Tensor Optimization

github

https://arxiv.org/abs/1802.09568

Nero

Learning by Turning: Neural Architecture Aware Optimisation

github

https://arxiv.org/abs/2102.07227

Adan

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

github

https://arxiv.org/abs/2208.06677

Adai

Disentangling the Effects of Adaptive Learning Rate and Momentum

github

https://arxiv.org/abs/2006.15815

GSAM

Surrogate Gap Guided Sharpness-Aware Minimization

github

https://openreview.net/pdf?id=edONMAnhLu-

D-Adaptation

Learning-Rate-Free Learning by D-Adaptation

github

https://arxiv.org/abs/2301.07733

AdaFactor

Adaptive Learning Rates with Sublinear Memory Cost

github

https://arxiv.org/abs/1804.04235

Apollo

An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

github

https://arxiv.org/abs/2009.13586

NovoGrad

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

github

https://arxiv.org/abs/1905.11286

Lion

Symbolic Discovery of Optimization Algorithms

github

https://arxiv.org/abs/2302.06675

Ali-G

Adaptive Learning Rates for Interpolation with Gradients

github

https://arxiv.org/abs/1906.05661

SM3

Memory-Efficient Adaptive Optimization

github

https://arxiv.org/abs/1901.11150

AdaNorm

Adaptive Gradient Norm Correction based Optimizer for CNNs

github

https://arxiv.org/abs/2210.06364

RotoGrad

Gradient Homogenization in Multitask Learning

github

https://openreview.net/pdf?id=T8wHz4rnuGL

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping

Gradient Centralization

Softplus Transformation

Gradient Normalization

Norm Loss

Positive-Negative Momentum

Linear learning rate warmup

Stable weight decay

Explore-exploit learning rate schedule

Lookahead

Chebyshev learning rate schedule

(Adaptive) Sharpness-Aware Minimization

On the Convergence of Adam and Beyond

Gradient Surgery for Multi-Task Learning

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.
AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

Gradient Centralization

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

Gradient Normalization

Norm Loss

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png

Positive-Negative Momentum

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

Stable weight decay

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is
updated and substituted to the current weights every k_{lookahead} steps (5 by default).

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

On the Convergence of Adam and Beyond

Gradient Surgery for Multi-Task Learning

Citations

AdamP

Adaptive Gradient Clipping

Chebyshev LR Schedules

Gradient Centralization

Lookahead

RAdam

Norm Loss

Positive-Negative Momentum

Explore-Exploit Learning Rate Schedule

On the adequacy of untuned warmup for adaptive optimization

Stable weight decay regularization

Softplus transformation

MADGRAD

AdaHessian

AdaBound

Adabelief

Sharpness-aware minimization

Adaptive Sharpness-aware minimization

diffGrad

On the Convergence of Adam and Beyond

Gradient surgery for multi-task learning

AdamD

Shampoo

Nero

Adan

Adai

GSAM

D-Adaptation

AdaFactor

Apollo

NovoGrad

Lion

Ali-G

SM3

AdaNorm

RotoGrad

Citation

Please cite original authors of optimization algorithms. If you use this software, please cite it as below. Or you can get from “cite this repository” button.

@software{Kim_pytorch_optimizer_Bunch_of_2022,
    author = {Kim, Hyeongchan},
    month = {1},
    title = {{pytorch_optimizer: optimizer & lr scheduler implementations in PyTorch}},
    version = {1.0.0},
    year = {2022}
}

Author

Hyeongchan Kim / @kozistr

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_optimizer-2.7.0.tar.gz (64.1 kB view details)

Uploaded Source

Built Distribution

pytorch_optimizer-2.7.0-py3-none-any.whl (95.6 kB view details)

Uploaded Python 3

File details

Details for the file pytorch_optimizer-2.7.0.tar.gz.

File metadata

  • Download URL: pytorch_optimizer-2.7.0.tar.gz
  • Upload date:
  • Size: 64.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Linux/5.15.0-1035-azure

File hashes

Hashes for pytorch_optimizer-2.7.0.tar.gz
Algorithm Hash digest
SHA256 2757f7826e50ff3d0e90cdf141d65a04cab9f8b0224e5caaf21a67867d3680d2
MD5 00601fed131febd905de9737a2c870db
BLAKE2b-256 a9571ee1619752ebfd0cc3244b86ebd8b0d43356a297cb7e88b3bdcec9102355

See more details on using hashes here.

File details

Details for the file pytorch_optimizer-2.7.0-py3-none-any.whl.

File metadata

  • Download URL: pytorch_optimizer-2.7.0-py3-none-any.whl
  • Upload date:
  • Size: 95.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Linux/5.15.0-1035-azure

File hashes

Hashes for pytorch_optimizer-2.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7da4a3e5bb2748cc50a4db3a4d92c3369c14c4f729ec393f6a3bc78932f1a2ec
MD5 0c61aa5e095b70e60bde5bf9c21872d5
BLAKE2b-256 46c807823d8ecc1236a068743d895f936f0bc3d21ebc2cc89b12b908ec8baf41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page