Efficient optimizers
Project description
HeavyBall
A simple package of efficient optimizers
The goal is not to thrive for completeness, full maintenance or abstraction, but instead to provide a simple
largely static alternative to torch.optim
with more and better optimizers.
Getting started
pip install heavyball
import torch
import heavyball
# Create a model
model = torch.nn.Linear(16, 1)
# Create an optimizer
optimizer = heavyball.PaLMForeachSFAdamW(model.parameters(), lr=1e-3)
x = torch.randn(128, 16)
y = torch.randn(128, 1)
for _ in range(1000):
optimizer.zero_grad()
loss = torch.nn.functional.mse_loss(model(x), y)
loss.backward()
optimizer.step()
Optimizers
Name | Description | Advantages / Disadvantages |
---|---|---|
ForeachAdamW | More efficient (speed, memory) AdamW | + Faster than AdamW + Possibly more (numerically) stable |
ForeachLaProp | More efficient (speed, memory) LaProp | + Same cost as AdamW + Marginally better converence (better proofs) + Higher hyperparameter stability - Not a guaranteed win (can be neutral) - No "Slingshot" |
ForeachADOPT | More efficient (speed, memory) ADOPT | + Same cost as AdamW + Rigorous mathematical convergence proofs, even for challenging models (GANs) - Empirically underperforms LaProp - no bf16 |
ForeachSFAdamW | More efficient (speed, memory) ScheduleFree AdamW | + Same cost as AdamW, but better eval perf + Full control over hyperparameters |
PaLMForeachSFAdamW | ForeachSFAdamW with PaLM's beta2 schedule | + Same cost as AdamW, but better eval perf + Less control, but faster early and more stable late convergence + ScheduleFree - slow early convergence |
ForeachSOAP | More efficient (speed, memory) SOAP | + Fastest convergence (loss-at-step) + Full control over hyperparameters - more memory usage - more hyperparameters - higher overhead than AdamW (can be ammortized; better loss-at-second) |
PaLMForeachSOAP | ForeachSOAP with PaLM's beta2 schedule | + Fastest convergence (loss-at-step) + Less control, but faster early and more stable late convergence - more memory usage - more hyperparameters - higher overhead than AdamW (can be ammortized; better loss-at-second) |
SFPaLMForeachSOAP | ScheduleFree PaLMForeachSOAP | + Fast convergence (loss-at-step) + less memory usage than PaLMForeachSOAP (more tham AdamW) - slower initial convergence than PaLMForeachSOAP (but allows higher LRs) - higher overhead than AdamW (can be ammortized) |
PrecondScheduleSFPaLMForeachSOAP | SFPaLMForeachSOAP with preconditioner schedule, matching the error of PrecondEvery=2 with the cost of PrecondEvery=512 | + Better initial convergence than SFPaLMForeachSOAP + Significantly faster (sec/it) later + less memory usage than PaLMForeachSOAP (more tham AdamW) - slower initial convergence than PaLMForeachSOAP (but allows higher LRs) - higher overhead than AdamW (can be ammortized) |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
heavyball-0.6.0.tar.gz
(13.6 kB
view details)
Built Distribution
heavyball-0.6.0-py3-none-any.whl
(24.7 kB
view details)
File details
Details for the file heavyball-0.6.0.tar.gz
.
File metadata
- Download URL: heavyball-0.6.0.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e9affbd8836d8c8bfecfc3a72e3e64c1778313b6f8869ceb6a296894ee7c204 |
|
MD5 | d12489a3393439269c0cea43972c1e59 |
|
BLAKE2b-256 | b40ddd1628dc7e00af771610d856c157c47231ae65d33f4c56525d1fd0013151 |
File details
Details for the file heavyball-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: heavyball-0.6.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaa19950095b3371bae5f3095f787e6043af41b54e067911d9a2d7f98bcd0f07 |
|
MD5 | 46ca5a157e05289b7b042dff2cdd8c5a |
|
BLAKE2b-256 | b19836cb25e0353b8cc61a58a5b9916592cb3bd6608209f285016967eae3b787 |