Skip to main content

Parameter-Free Optimizers for Pytorch

Project description

Parameter-Free

Parameter-Free Optimizers for PyTorch

This is a library for parameter-free optimization in PyTorch. Parameter-free is a technical term that denotes a certain ability of an optimization algorithm to adapt to the unknown distance to the optimal solution. But in practice it means ``SGD without learning rates'' :)

Installation

To install the package, simply run pip install parameterfree.

Usage

All the parameter-free algorithms are implemented using the standard PyTorch optimizer interface. After installing with pip, you can simply import and use any of them in your code. For example, to use COCOB, write

from parameterfree import COCOB
optimizer = COCOB(optimizer args)

where optimizer args follows the standard PyTorch optimizer syntax.

Details

There have been many parameter-free algorithms proposed over the past 10 years. Some of them work very well, even winning Kaggle competitions, some others are mainly interesting from a theoretical point of view. However, the original code of many of this algorithms got lost or the authors stopped maintaining them. This also makes it difficult to compare recent algorithms with previous variants.

Moreover, people in this field like me know that it is possible to combine tricks and reductions, like LEGO blocks, to obtain even more powerful parameter-free algorithms. However, there is essentially no code available, but only long papers full of math :)

So, I decided to write a single library to gather all the parameter-free algorithms I know and possibly of some interesting and easy to obtain variants. I'll add them slowly over time. Here the current implemented algorithms:

  • COCOB This is the first parameter-free algorithms specifically designed for deep learning. The original code was in Tensorflow. It was used to win a Kaggle competition. The paper is Francesco Orabona, Tatiana Tommasi. Training Deep Networks without Learning Rates Through Coin Betting. NeurIPS'17

  • KT First parameter-free algorithm based on the coin-betting framework. Code never released. Warning: KT might diverge if the stochastic gradients are not bounded in L2 norm by 1, so it is safe to use only with gradient clipping. The paper is Francesco Orabona, Dávid Pál. Coin Betting and Parameter-Free Online Learning. NeurIPS'16

Generalization and Parameter-Free Algorithms

It is well-known that the learning rate of the optimization algorithm also influences the generalization performance. So, what happens to the generalization of a deep network trained with a parameter-free algorithm? Well, it is very difficult to say because we do not have a strong theory of how generalization work in deep networks. In general, if your network does not overfit the data with the usual optimizers, it won't overfit with parameter-free algorithms. However, if your network tend to overfit and you have to set the learning rate carefully to avoid it, parameter-free algorithms might overfit.

From my personal point of view, slowing down the optimizer to increase the generalization performance is a bad idea because you are entangling two different things: optimization and generalization. I instead suggest to use the most aggressive optimization procedure coupled with a regularization method. In this way, the optimizer just has to worry about optimizing functions and the regularizer will take care of the generalization, separating these two aspects.

On the other hand, there is a recent trend on training only for 1-2 epochs on massive datasets. This is particularly true for large language models. It is very easy to show that it is impossible to overfit with only 1 epochs. The reason is that with only 1 epoch you are directly optimizing the test loss, together with the training loss. This is not strictly true after the first epoch, but it is still approximately true if the number of epochs is small, let's say 2-3. In this cases, parameter-free algorithms make perfect sense!

Other Parameter-Free Algorithms

Recently parameter-free algorithms have become more popular, especially after the discovery that it is possible to design them in the primal space too. Here, the links to some other software for PyTorch.

Preliminary Experiments

Probably due to the lack of reliable software, a comprehensive benchmark of parameter-free algorithms is missing. I plan to do one and please contact me if you want to help. In the while, here some toy experiments, just to show you that these algorithms do work.

vision FashionMNIST, 2 fully connected hidden layers with 1000 hidden units each and ReLU activations, mini-batch size of 100. Constant default settings for Adam.

License

See the License file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parameterfree-0.0.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

parameterfree-0.0.1-py3.10.egg (10.4 kB view details)

Uploaded Egg

parameterfree-0.0.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file parameterfree-0.0.1.tar.gz.

File metadata

  • Download URL: parameterfree-0.0.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for parameterfree-0.0.1.tar.gz
Algorithm Hash digest
SHA256 812f5aede824a97b0a65f8696986b14c11a4915e311ce8b2c23381d953ead79c
MD5 d124f3c3ebf8c5e07866aac0e3a03741
BLAKE2b-256 85292aca27c217dddfe0340a6c5dbba8b0d326802000e7a4ab2e1eb7210a93a0

See more details on using hashes here.

File details

Details for the file parameterfree-0.0.1-py3.10.egg.

File metadata

  • Download URL: parameterfree-0.0.1-py3.10.egg
  • Upload date:
  • Size: 10.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for parameterfree-0.0.1-py3.10.egg
Algorithm Hash digest
SHA256 2529cda5b2ab0cd139afe85a0f0bb806d08addd5c2d9a10d89a9ce6dc0bbc5d9
MD5 7361ac358df903671072dd198bbf0ae9
BLAKE2b-256 44263f7c0b253748a32ac6b69d99be39d169d508ab6c8739c4cd75f6409b9182

See more details on using hashes here.

File details

Details for the file parameterfree-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: parameterfree-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for parameterfree-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2664bee9f53c510eb6c71946e11a0ea3c91794f92acd7d9b53d78bd58659c29b
MD5 8adc958fe3ae3fdbebfb41cdde0488bd
BLAKE2b-256 4e21afe03654bb7e85b117941b984881a6da73658a33f9e80073b68a3c541e34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page