black box tuning of optimizers
Project description
Mechanic: black-box tuning of optimizers
Based on the paper: https://arxiv.org/abs/2306.00144
Be aware that all experiments reported in the paper were run using the JAX version of mechanic, which is available in optax via optax.contrib.mechanize
.
Mechanic aims to remove the need for tuning a learning rate scalar (i.e. the maximum learning rate in a schedule). You can use it with any pytorch optimizer and schedule. Simply replace:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
with:
from mechanic_pytorch import mechanize
optimizer = mechanize(torch.optim.SGD)(model.parameters(), lr=1.0)
# you can set the lr to anything here, but excessivel small values may cause numerical precision issues.
That's it! The new optimizer should no longer require tuning the learning rate scale! That is, the optimizer should now be very robust to heavily mis-specified values of lr
.
Installation
pip install mechanic-pytorch
Note that the package name is mechanic-pytorch
, but you should import mechanic_pytorch
(dash replaced with underscore).
Options
It is possible to play with the configuration of mechanic, although this should be unecessary:
optimizer = mechanize(torch.optim.SGD, s_decay=0.0, betas=(0.999,0.999999), store_delta=False)(model.parameters(), lr=0.01)
- The option
store_delta=False
is set to minimize memory usage. An minimum we currently keep one extra "slot" of memory (i.e. an extra copy of the weights). If you are ok keeping one more copy, you can setstore_delta=True
. This will make the first few iterations have a slightly more accurate update, and usually has negligible effect. - The option
s_decay
is a bit like a weight-decay term that empirically is helpful for smaller datasets. We use a default of 0.01 in all our experiments. For larger datasets, smaller values (even 0.0) often worked as well. - The option
betas
is a list of exponential weighting factors used internally in mechanic. They are NOT related to beta values found in Adam. In theory, it should be safe to provide a large list of possibilities here. The default settings of(0.9,0.99,0.999,0.9999,0.99999,0.999999)
seem to work will in a range of tasks. s_init
is the initial value for the mechanic learning rate. It should be an underestimate of the correct learning rate, and it can safely be set to a very small value (default 1e-8), although it cannot be set to zero. In particular, the theoretical analysis of mechanic includes a log(1/s_init) term. This is very robust to small values, but will eventually blow up if you makes_init
absurdly small.
License
mechanic
is distributed under the terms of the Apache-2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mechanic_pytorch-0.0.1.tar.gz
.
File metadata
- Download URL: mechanic_pytorch-0.0.1.tar.gz
- Upload date:
- Size: 17.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10e16464c4764ce4e4ade92dd24eed985d5fc5e5cc664858b719ed9fe002a25d |
|
MD5 | 2f110839be85df84e4f28752c87621e5 |
|
BLAKE2b-256 | 6dc30b2fea755817314598e57eff009abb1a30680dd9c4636f5e13be46b7c776 |
File details
Details for the file mechanic_pytorch-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: mechanic_pytorch-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1fd7bf2953c1ba6c7ef3bee371740159647015113eaf033847456774052f299 |
|
MD5 | 46b698080846250ca6e7f96637cee19b |
|
BLAKE2b-256 | 14ac9d33a19cb6b99b7b8c10c62b37f3944377cd0a147848e288dd26ee9416df |