A LARS implementation in PyTorch
A LARS implementation in PyTorch.
from torchlars import LARS optimizer = LARS(optim.SGD(model.parameters(), lr=0.1))
What is LARS?
LARS (Layer-wise Adaptive Rate Scaling) is an optimization algorithm designed for large-batch training published by You, Gitman, and Ginsburg, which calculates the local learning rate per layer at each optimization step. According to the paper, when training ResNet-50 on ImageNet ILSVRC (2016) classification task with LARS, the learning curve and the best top-1 accuracy stay similar to that of the baseline (training with batch size 256 without LARS) even if the batch size is scaled up to 32K.
Originally, LARS is formulated in terms of SGD optimizer and extension to other
optimizers was not mentioned in the paper. In contrast,
LARS as a wrapper which can take any optimizers including SGD as the base.
Additionally, LARS of torchlars is designed to more consider operation in the CUDA environment compared to existing implementations. Thanks to this, you can see only the little speed loss appears compared to just using SGD, in an environment where CPU to GPU synchronization does not occur.
Currently, torchlars requires the following environments:
- Python 3.6+
- PyTorch 1.1+
- CUDA 10+
To use torchlars, install it via PyPI:
$ pip install torchlars
To use LARS, simply wrap your base optimizer with
torch.optim.Optimizer, so you can simply use LARS as optimizer on
your code. After then, when you call step method of LARS, LARS automatically
calculates local learning rate before running base optimizer such as SGD or
The example code below shows how to use LARS using SGD as base optimizer.
from torchlars import LARS base_optimizer = optim.SGD(model.parameters(), lr=0.1) optimizer = LARS(optimizer=base_optimizer, eps=1e-8, trust_coef=0.001) output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step()
ResNet-50 on ImageNet classification
|Batch Size||LR policy||lr||warm-up||epoch||Best Top-1 accuracy, %|
Above image and table show the reproduced performance benchmark on ResNet-50, as reported in Table 4 and Figure 5 of the paper.
The cyan line represents the baseline result, which is training result with batch size 256, and others represent training result of 8K, 16K, 32K respectively. As you see, every result shows a similar learning curve and best top-1 accuracy.
Most experimental conditions are similar to used in the paper, but we slightly change some conditions like learning rate to observe comparable results as proposed by the LARS paper.
Note: We refer log file provided by paper to obtain above hyper-parameters.
Authors and Licensing
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size torchlars-0.1.0.tar.gz (6.2 kB)||File type Source||Python version None||Upload date||Hashes View hashes|