An Adam-like optimizer for neural networks with adaptive estimation of learning rate

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

konstmish

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

This is the official repository used to run the experiments in the paper that proposed the Prodigy optimizer. The optimizer is implemented in PyTorch. There is also a JAX version of Prodigy in Optax, which currently does not have the slice_p argument.

Prodigy: An Expeditiously Adaptive Parameter-Free Learner
K. Mishchenko, A. Defazio
Paper: https://arxiv.org/pdf/2306.06101.pdf

Installation

To install the package, simply run pip install prodigyopt

How to use

Let net be the neural network you want to train. Then, you can use the method as follows:

from prodigyopt import Prodigy
# choose weight decay value based on your problem, 0 by default
# set slice_p to 11 if you have limited memory, 1 by default
opt = Prodigy(net.parameters(), lr=1., weight_decay=weight_decay, slice_p=slice_p)

Note that by default, Prodigy uses weight decay as in AdamW. If you want it to use standard $\ell_2$ regularization (as in Adam), use option decouple=False. We recommend using lr=1. (default) for all networks. If you want to force the method to estimate a smaller or larger learning rate, it is better to change the value of d_coef (1.0 by default). Values of d_coef above 1, such as 2 or 10, will force a larger estimate of the learning rate; set it to 0.5 or even 0.1 if you want a smaller learning rate.
Standard values of weight_decay to try are 0 (default in Prodigy), 0.001, 0.01 (default in AdamW), and 0.1.
Use values of slice_p larger than 1 to reduce the memory consumption. slice_p=11 should give a good trade-off between accuracy of estimate learning rate and memory efficiency.

Scheduler

As a rule of thumb, we recommend either using no scheduler or using cosine annealing with the method:

# n_epoch is the total number of epochs to train the network
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=total_steps)

We do not recommend using restarts in cosine annealing, so we suggest setting T_max=total_steps, where total_steps should be the number of times scheduler.step() is called. If you do use restarts, we highly recommend setting safeguard_warmup=True.

Extra care should be taken if you use linear warm-up at the beginning: The method will see slow progress due to the initially small base learning rate, so it might overestimate d. To avoid issues with warm-up, use option safeguard_warmup=True.

Diffusion models

Based on the interaction with some of the users, we recommend setting safeguard_warmup=True, use_bias_correction=True, and weight_decay=0.01 when training diffusion models. Sometimes, it is helpful to set betas=(0.9, 0.99).
If the model is not training, try to keep track of d and if it remains too small, it might be worth increasing d0 to 1e-5 or even 1e-4. That being said, the optimizer was mostly insensitive to d0 in our other experiments.

Examples of using Prodigy

See this Colab Notebook for a toy example of how one can use Prodigy to train ResNet-18 on Cifar10 (test accuracy 80% after 20 epochs).
If you are interested in sharing your experience, please consider creating a Colab Notebook and sharing it in the issues.

How to cite

If you find our work useful, please consider citing our paper.

@inproceedings{mishchenko2024prodigy,
    title={Prodigy: An Expeditiously Adaptive Parameter-Free Learner},
    author={Mishchenko, Konstantin and Defazio, Aaron},
    booktitle={Forty-first International Conference on Machine Learning},
    year={2024},
    url={https://openreview.net/forum?id=JJpOssn0uP}
}

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

konstmish

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.2

Jan 16, 2025

1.1.1

Dec 18, 2024

1.1

Dec 11, 2024

1.0

Jun 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prodigyopt-1.1.2.tar.gz (9.4 kB view details)

Uploaded Jan 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prodigyopt-1.1.2-py3-none-any.whl (10.5 kB view details)

Uploaded Jan 16, 2025 Python 3

File details

Details for the file prodigyopt-1.1.2.tar.gz.

File metadata

Download URL: prodigyopt-1.1.2.tar.gz
Upload date: Jan 16, 2025
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for prodigyopt-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`f6ef74944895c9b9a0045e55fdd04d07bdb03b9f09a2c77e2ec772c9d1ece15f`
MD5	`ec54a98aaf80fd6fd23f9e86b787d336`
BLAKE2b-256	`6b8e5ee7ea4f8ca1e1d8d5d868c6c97e53e8dc769caa383b5a16afc6fd7b28c7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prodigyopt-1.1.2.tar.gz:

Publisher: python-publish.yml on konstmish/prodigy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prodigyopt-1.1.2.tar.gz
- Subject digest: f6ef74944895c9b9a0045e55fdd04d07bdb03b9f09a2c77e2ec772c9d1ece15f
- Sigstore transparency entry: 162962395
- Sigstore integration time: Jan 16, 2025
Source repository:
- Permalink: konstmish/prodigy@3efb213ee8af5a6bf76f28726398433a847b38e9
- Branch / Tag: refs/tags/v1.1.2
- Owner: https://github.com/konstmish
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3efb213ee8af5a6bf76f28726398433a847b38e9
- Trigger Event: release

File details

Details for the file prodigyopt-1.1.2-py3-none-any.whl.

File metadata

Download URL: prodigyopt-1.1.2-py3-none-any.whl
Upload date: Jan 16, 2025
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for prodigyopt-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f8f79f71d37e4a501527fc30ecc847731369dc7cbe12e6a178157924e30ce03`
MD5	`ebe773173c7aa1c9c63752b423910416`
BLAKE2b-256	`5726a798a2e274d77944e784c1c577b85fcbdb7189e59ff821d9a2d6b462f853`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prodigyopt-1.1.2-py3-none-any.whl:

Publisher: python-publish.yml on konstmish/prodigy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prodigyopt-1.1.2-py3-none-any.whl
- Subject digest: 4f8f79f71d37e4a501527fc30ecc847731369dc7cbe12e6a178157924e30ce03
- Sigstore transparency entry: 162962397
- Sigstore integration time: Jan 16, 2025
Source repository:
- Permalink: konstmish/prodigy@3efb213ee8af5a6bf76f28726398433a847b38e9
- Branch / Tag: refs/tags/v1.1.2
- Owner: https://github.com/konstmish
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3efb213ee8af5a6bf76f28726398433a847b38e9
- Trigger Event: release

prodigyopt 1.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Installation

How to use

Scheduler

Diffusion models

Examples of using Prodigy

How to cite

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance