Skip to main content

Efficient optimizers

Project description

HeavyBall

A simple package of efficient optimizers

The goal is not to thrive for completeness, full maintenance or abstraction, but instead to provide a simple largely static alternative to torch.optim with more and better optimizers.

Getting started

pip install heavyball
import torch
import heavyball

# Create a model
model = torch.nn.Linear(16, 1)

# Create an optimizer
optimizer = heavyball.PaLMForeachSFAdamW(model.parameters(), lr=1e-3)

x = torch.randn(128, 16)
y = torch.randn(128, 1)

for _ in range(1000):
    optimizer.zero_grad()
    loss = torch.nn.functional.mse_loss(model(x), y)
    loss.backward()
    optimizer.step()

Optimizers

Name Description Advantages / Disadvantages
ForeachAdamW More efficient (speed, memory) AdamW + Faster than AdamW
+ Possibly more (numerically) stable
ForeachLaProp More efficient (speed, memory) LaProp + Same cost as AdamW
+ Marginally better converence (better proofs)
+ Higher hyperparameter stability
- Not a guaranteed win (can be neutral)
- No "Slingshot"
ForeachADOPT More efficient (speed, memory) ADOPT + Same cost as AdamW
+ Rigorous mathematical convergence proofs, even for challenging models (GANs)
- Empirically underperforms LaProp
- no bf16
ForeachSFAdamW More efficient (speed, memory) ScheduleFree AdamW + Same cost as AdamW, but better eval perf
+ Full control over hyperparameters
PaLMForeachSFAdamW ForeachSFAdamW with PaLM's beta2 schedule + Same cost as AdamW, but better eval perf
+ Less control, but faster early and more stable late convergence
+ ScheduleFree
- slow early convergence
ForeachSOAP More efficient (speed, memory) SOAP + Fastest convergence (loss-at-step)
+ Full control over hyperparameters
- more memory usage
- more hyperparameters
- higher overhead than AdamW (can be ammortized; better loss-at-second)
PaLMForeachSOAP ForeachSOAP with PaLM's beta2 schedule + Fastest convergence (loss-at-step)
+ Less control, but faster early and more stable late convergence
- more memory usage
- more hyperparameters
- higher overhead than AdamW (can be ammortized; better loss-at-second)
SFPaLMForeachSOAP ScheduleFree PaLMForeachSOAP + Fast convergence (loss-at-step)
+ less memory usage than PaLMForeachSOAP (more tham AdamW)
- slower initial convergence than PaLMForeachSOAP (but allows higher LRs)
- higher overhead than AdamW (can be ammortized)
PrecondScheduleSFPaLMForeachSOAP SFPaLMForeachSOAP with preconditioner schedule, matching the error of PrecondEvery=2 with the cost of PrecondEvery=512 + Better initial convergence than SFPaLMForeachSOAP
+ Significantly faster (sec/it) later
+ less memory usage than PaLMForeachSOAP (more tham AdamW)
- slower initial convergence than PaLMForeachSOAP (but allows higher LRs)
- higher overhead than AdamW (can be ammortized)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heavyball-0.6.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

heavyball-0.6.0-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file heavyball-0.6.0.tar.gz.

File metadata

  • Download URL: heavyball-0.6.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for heavyball-0.6.0.tar.gz
Algorithm Hash digest
SHA256 5e9affbd8836d8c8bfecfc3a72e3e64c1778313b6f8869ceb6a296894ee7c204
MD5 d12489a3393439269c0cea43972c1e59
BLAKE2b-256 b40ddd1628dc7e00af771610d856c157c47231ae65d33f4c56525d1fd0013151

See more details on using hashes here.

File details

Details for the file heavyball-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: heavyball-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for heavyball-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa19950095b3371bae5f3095f787e6043af41b54e067911d9a2d7f98bcd0f07
MD5 46ca5a157e05289b7b042dff2cdd8c5a
BLAKE2b-256 b19836cb25e0353b8cc61a58a5b9916592cb3bd6608209f285016967eae3b787

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page