Automatic learning rate optimiser based on Prodigy and Schedule-Free

These details have not been verified by PyPI

Project links

Homepage

Project description

Prodigy + ScheduleFree

Eliminating hyperparameters, one commit at a time.

Current status: Experimental

Installation

pip install prodigy-plus-schedule-free

Usage

from prodigyplus.prodigy_plus_schedulefree import ProdigyPlusScheduleFree
optimizer = ProdigyPlusScheduleFree(model.parameters(), lr=1.0, betas=(0.9, 0.99), beta3=None, 
				    beta4=0, weight_decay=0.0, use_bias_correction=False, 
				    d0=1e-6, d_coef=1.0, prodigy_steps=0, warmup_steps=0, 
				    eps=1e-8, split_groups=True, split_groups_mean="harmonic_mean",
                                    factored=True, fused_back_pass=False, use_stableadamw=True,
                                    use_muon_pp=False, use_cautious=False, use_adopt=False, 
				    stochastic_rounding=True)

As with the reference implementation of schedule-free, a constant scheduler should be used, along with the appropriate calls to optimizer.train() and optimizer.eval(). See the schedule-free documentation for more details: https://github.com/facebookresearch/schedule_free

Details

An optimiser based on Prodigy that includes schedule-free logic and much, much lower memory usage, the aim being to remove the need to set any hyperparameters. Of course, that's never the case with any optimiser, but hopefully, this comes close!

Hyperparameters eliminated: Learning rate (Prodigy), LR scheduler (ScheduleFree), epsilon (Adam-atan2, optional, not enabled by default). Still working on betas and weight decay, though those are much harder.

Based on code from:

Incorporates improvements from these pull requests (credit to https://github.com/dxqbYD and https://github.com/sangoi-exe):

If you do use another scheduler, linear or cosine is preferred, as a restarting scheduler can confuse Prodigy's adaptation logic.

Leave lr set to 1 unless you encounter instability. Do not use with gradient clipping, as this can hamper the ability for the optimiser to predict stepsizes. Gradient clipping/normalisation is already handled in the following configurations:

use_stableadamw=True,eps=1e8 (or any reasonable positive epsilon. This is the default.)
eps=None (Adam-atan2, scale invariant, but can mess with Prodigy's stepsize calculations in some scenarios.)
use_muon_pp=True (Updates are scaled by their root-mean-square. Experimental!)

A new parameter, beta4, allows d to be updated via a moving average, rather than being immediately updated. This can help smooth out learning rate adjustments. Values of 0.9-0.99 are recommended if trying out the feature. If set to None, the square root of beta1 is used, while a setting of 0 (the default) disables the feature.

By default, split_groups is set to True, so each parameter group will have its own adaptation values. So if you're training different networks together, they won't contaminate each other's learning rates. The disadvantage of this approach is that some networks can take a long time to reach a good learning rate when trained alongside others (for example, SDXL's Unet). It's recommended to use a higher d0 (1e-5, 5e-5, 1e-4) so these networks don't get stuck at a low learning rate.

For Prodigy's reference behaviour, which lumps all parameter groups together, set split_groups to False.

The optimiser uses low-rank approximations for the second moment, much like Adafactor. There should be little to no difference in training performance, but your mileage may vary. If you encounter problems, you can try disabling factorisation by setting factored to False.

The optimiser also supports fused backward pass to significantly lower gradient memory usage. The fused_back_pass argument must be set to True so the optimiser knows not to perform the regular step. Please note however that your training scripts / UI of choice must support the feature for generic optimisers -- as of November 2024, popular trainers such as OneTrainer and Kohya hard-code which optimisers have fused backward pass support, and so this optimiser's fused pass will not work out of the box with them.

In some scenarios, it can be advantageous to freeze Prodigy's adaptive stepsize after a certain number of steps. This can be controlled via the prodigy_steps settings. It's been suggested that all Prodigy needs to do is achieve "escape velocity" in terms of finding a good LR, which it usually achieves after ~25% of training, though this is very dependent on batch size and epochs.

This setting can be particularly helpful when training diffusion models, which have very different gradient behaviour than what most optimisers are tuned for. Prodigy in particular will increase the LR forever if it is not stopped or capped in some way (usually via a decaying LR scheduler).

Experimental features

Adam-atan2: Enabled by setting eps to None. Outlined in Scaling Exponents Across Parameterizations and Optimizers, you can use atan2 in place of the regular division plus epsilon found in most Adam-style optimisers. This makes updates scale-invariant, and removes the need to tweak the epsilon. This seems to work fine in some models (SDXL), but cripples Prodigy's stepsize calculations in others (SD3.5 Medium and Large). Disabled by default.

Orthogonalisation: Enabled by setting use_muon_pp to True. As explained by Keller Jordan, and demonstrated (in various forms) by optimisers such as Shampoo, SOAP and Jordan's Muon, applying orthogonalisation/preconditioning can improve convergence. However, this approach may not work in some situations (small batch sizes, fine-tuning) and as such, is disabled by default.

C-Optim: Enabled by setting use_cautious to True. Outlined in Cautious Optimizers: Improving Training with One Line of Code. Applies a simple modification to parameter updates that promotes values that are aligned with the current gradient. This should result in faster convergence. Note that the proposed changes are not 1:1 compatible with schedule-free, so more testing is required.

Recommended usage

The schedule-free component of the optimiser works best with a constant learning rate. In most cases, Prodigy will find the optimal learning rate within the first 25% of training, after which it may continue to increase the learning rate beyond what's best.

It is strongly recommended to set prodigy_steps equal to 25% of your total step count, though you can experiment with values as little as 5-10%, depending on the model and type of training. The best way to figure out the best value is to monitor the d value(s) during a training run.

Here is an example of an SDXL LoRA run. From left to right are the d values (essentially the learning rate predicition) for TE1, TE2 and the Unet. In this run, prodigy_steps was set to 20, as the optimal LR was found around step 15.

This image shows a different run with the same dataset, but with prodigy_steps set to 0. While the text encoders were mostly stable, the Unet LR continued to grow throughout training.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.1

Sep 27, 2025

2.0.0

Sep 13, 2025

2.0.0rc2 pre-release

May 30, 2025

2.0.0rc1 pre-release

May 19, 2025

2.0.0b5 pre-release

May 14, 2025

2.0.0b4 pre-release

May 13, 2025

2.0.0b3 pre-release

May 12, 2025

2.0.0b2 pre-release

May 12, 2025

2.0.0b1 pre-release

May 11, 2025

1.9.2

May 13, 2025

1.9.1

Mar 25, 2025

1.9.0

Jan 30, 2025

1.8.51

Jan 13, 2025

1.8.33

Jan 9, 2025

1.8.32

Jan 9, 2025

1.8.31

Jan 9, 2025

1.8.21

Jan 7, 2025

1.8.5

Jan 13, 2025

1.8.4

Jan 10, 2025

1.8.3

Jan 9, 2025

1.8.2

Jan 7, 2025

1.8.1

Jan 6, 2025

1.8.0

Dec 17, 2024

1.7.0

Dec 3, 2024

This version

1.6.31

Nov 28, 2024

1.6.3

Nov 28, 2024

1.6.2

Nov 26, 2024

1.6.1

Nov 26, 2024

1.6.0

Nov 26, 2024

1.5.1

Nov 26, 2024

1.5.0

Nov 24, 2024

1.4.2

Nov 23, 2024

1.4.1

Nov 23, 2024

1.4.0

Nov 23, 2024

1.3.2

Nov 22, 2024

1.3.1 yanked

Nov 22, 2024

Reason this release was yanked:

Bugged release.

1.3.0

Nov 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prodigy_plus_schedule_free-1.6.31.tar.gz (17.6 kB view details)

Uploaded Nov 28, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prodigy_plus_schedule_free-1.6.31-py3-none-any.whl (19.0 kB view details)

Uploaded Nov 28, 2024 Python 3

File details

Details for the file prodigy_plus_schedule_free-1.6.31.tar.gz.

File metadata

Download URL: prodigy_plus_schedule_free-1.6.31.tar.gz
Upload date: Nov 28, 2024
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for prodigy_plus_schedule_free-1.6.31.tar.gz
Algorithm	Hash digest
SHA256	`39c01baf36d7a2c641c0d6e96b648bd7d2b11aef69782530cee968de25c3e0fd`
MD5	`1287a02cfd4452a2b5f6981601696f0b`
BLAKE2b-256	`fdba168d1939353d15381f246509d082de8a6837013898782d1e8faee7a1a5f3`

See more details on using hashes here.

File details

Details for the file prodigy_plus_schedule_free-1.6.31-py3-none-any.whl.

File metadata

Download URL: prodigy_plus_schedule_free-1.6.31-py3-none-any.whl
Upload date: Nov 28, 2024
Size: 19.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for prodigy_plus_schedule_free-1.6.31-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1404fcb728f2aedb8c41ed0013ec13adab642d3999b62aaefe7c6149609f104b`
MD5	`b2757face07d69dda274696f01130e8b`
BLAKE2b-256	`56ed014863bccbb73d3baa0ea6cd5165f9432a1f5a66a0791c9c9b49eddfd08a`

See more details on using hashes here.

prodigy-plus-schedule-free 1.6.31

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Prodigy + ScheduleFree

Installation

Usage

Details

Experimental features

Recommended usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes