Skip to main content

Learning Rate Free Learning for Adam, SGD and AdaGrad

Project description

D-Adaptation

Learning rate free learning for SGD, AdaGrad and Adam!

by Aaron Defazio and Konstantin Mishchenko (Arxiv)

pip install dadaptation

Details

The provided Pytorch Optimizer classes are drop-in replacements, either copy into your project or use via pip with dadaptation.DAdaptSGD, dadaptation.DAdaptAdam or dadaptation.DAdaptAdaGrad.

  • Set the LR parameter to 1.0. This parameter is not ignored, rather, setting it larger to smaller will directly scale up or down the D-Adapted learning rate.
  • Use the same learning rate scheduler you would normally use on the problem.
  • The Adam variant supports AdamW style weight decay, just set decouple=True. It is not turned on by default, so if you are replacing your adam implementation, make sure you use decoupled if necessary.
  • It may be necessary to use larger weight decay than you would normally use, try a factor of 2 or 4 bigger if you see overfitting. D-Adaptation uses larger learning rates than people typically hand-choose, in some cases that requires more decay.
  • Use the log_every setting to see the learning rate being used (d*lr) and the current D bound.
  • Only the AdaGrad version supports sparse gradients.
  • The Adam IP variant implements a tighter D bound, which may help on some problems. The IP variants should be considered experimental.
  • Parameter-group level LR values are not fully supported. The optimizer only supports setting zero LR for some groups in order to do fine-tuning on parts of a model.

Change Log

Version 2.0

  • Added DAdaptAdan - should still be considered experimental.
  • Added support for PyTorch's Fully Sharded Data Parallel.
  • Improved support of edge cases such as learning rate zero.
  • Improved logging - uses Python logging rather than print statements

Experimental results

vision vision vision vision vision vision vision vision vision vision

License

See the License file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dadaptation-2.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

dadaptation-2.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file dadaptation-2.0.tar.gz.

File metadata

  • Download URL: dadaptation-2.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.9

File hashes

Hashes for dadaptation-2.0.tar.gz
Algorithm Hash digest
SHA256 07c750ca6345bc3fed019e922a40f6f47b1f041e29c9e9ded8b1a2638356fb01
MD5 cd46e65998c79cd725681b5db40124f9
BLAKE2b-256 aeea674a4143de5c54f409ec4ae1ee7133c2a5446b9d81c6c040c3b5514ffe64

See more details on using hashes here.

File details

Details for the file dadaptation-2.0-py3-none-any.whl.

File metadata

  • Download URL: dadaptation-2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.9

File hashes

Hashes for dadaptation-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c746d355390bd49545d6bca544fa5803b77515446465f8f4c889821583c9112
MD5 a2483bf772109ba6ce26f4eca43196d3
BLAKE2b-256 c307750b589521325d74f4144fa8ccff07c9ce61b339e27791d8a06650e07d22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page