A PyTorch Extension for Learning Rate Warmup
Project description
A PyTorch Extension for Learning Rate Warmup
This library contains PyTorch implementations of the warmup schedules described in On the adequacy of untuned warmup for adaptive optimization.
Installation
Make sure you have Python 3.6+ and PyTorch 1.1+. Then, run the following command:
python setup.py install
or
pip install -U pytorch_warmup
Usage
Sample Codes
The scheduled learning rate is dampened by the multiplication of the warmup factor:
Approach 1
When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used as follows:
import torch
import pytorch_warmup as warmup
optimizer = torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
num_steps = len(dataloader) * num_epochs
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
for epoch in range(1,num_epochs+1):
for batch in dataloader:
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
lr_scheduler.step()
warmup_scheduler.dampen()
Approach 2
When the learning rate schedule uses the epoch number, the warmup schedule can be used as follows (for PyTorch 1.2 or above):
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[num_epochs//3], gamma=0.1)
warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
warmup_scheduler.last_step = -1 # initialize the step counter
for epoch in range(1,num_epochs+1):
for batch in dataloader:
lr_scheduler.step(epoch-1)
warmup_scheduler.dampen()
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
The user warning about calling lr_scheduler.step()
before optimizer.step()
may be ignored.
Warmup Schedules
Manual Warmup
The warmup factor w(t)
depends on the warmup period, which must manually be specified, for LinearWarmup
and ExponentialWarmup
.
Linear
w(t) = min(1, t / warmup_period)
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period=2000)
Exponential
w(t) = 1 - exp(-t / warmup_period)
warmup_scheduler = warmup.ExponentialWarmup(optimizer, warmup_period=1000)
Untuned Warmup
The warmup period is given by a function of Adam's beta2
parameter for UntunedLinearWarmup
and UntunedExponentialWarmup
.
Linear
warmup_period = 2 / (1 - beta2)
warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
Exponential
warmup_period = 1 / (1 - beta2)
warmup_scheduler = warmup.UntunedExponentialWarmup(optimizer)
RAdam Warmup
The warmup factor depends on Adam's beta2
parameter for RAdamWarmup
. Please see the original paper for the details.
warmup_scheduler = warmup.RAdamWarmup(optimizer)
Apex's Adam
The Apex library provides an Adam optimizer tuned for CUDA devices, FusedAdam. The FusedAdam optimizer can be used with the warmup schedulers. For example:
optimizer = apex.optimizers.FusedAdam(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
License
MIT License
Copyright (c) 2019 Takenori Yamamoto
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pytorch_warmup-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38110440a51840120732844e1e312227bb787a2e675589c241484b95a6d515bc |
|
MD5 | 7912af946575fdcf191ce1de924af2d8 |
|
BLAKE2b-256 | 7a222fb600a06a1d1b493d54ac8fa6c41e96870985992fc504104e0620bc2ea4 |