Skip to main content

Efficient optimizers

Project description

HeavyBall

[!IMPORTANT]
The SOAP implementation was broken until 0.9.0. Please upgrade to 0.9.0 or later.

A simple package of efficient optimizers

The goal is not to thrive for completeness, full maintenance or abstraction, but instead to provide a simple largely static alternative to torch.optim with more and better optimizers.

Currently (2024-11-13, 0.12.5), the recommended stable optimizer is PrecondSchedulePaLMForeachSOAP (see below). The recommended experimental optimizer is ForeachPSGDKron.

Features

  • Stochastic Rounding: FP32 convergence with BF16 parameters
  • Inplace EMA: Same math, but less memory, less compute and higher stability
  • Foreach: Fast multi-tensor application
  • PaLM Beta2: Fast initial convergence, stable late convergence
  • ScheduleFree: No learning rate schedule, but better convergence
  • Preconditioner Schedule: Improved loss-per-step in early convergence, better step-per-second in late convergence (explained below)

Getting started

pip install heavyball
import torch
import heavyball

# Create a model
model = torch.nn.Linear(16, 1)

# Create an optimizer
optimizer = heavyball.PrecondSchedulePaLMForeachSOAP(model.parameters(), lr=1e-3)

x = torch.randn(128, 16)
y = torch.randn(128, 1)

for _ in range(1000):
    optimizer.zero_grad()
    loss = torch.nn.functional.mse_loss(model(x), y)
    loss.backward()
    optimizer.step()

Optimizers

Name Description Advantages / Disadvantages
ForeachAdamW More efficient (speed, memory) AdamW + Faster than AdamW
+ Possibly more (numerically) stable
ForeachLaProp More efficient (speed, memory) LaProp + Same cost as AdamW
+ Marginally better converence (better proofs)
+ Higher hyperparameter stability
- Not a guaranteed win (can be neutral)
- No "Slingshot"
ForeachADOPT More efficient (speed, memory) ADOPT + Same cost as AdamW
+ Rigorous mathematical convergence proofs, even for challenging models (GANs)
- Empirically underperforms LaProp
- no bf16
ForeachSFAdamW More efficient (speed, memory) ScheduleFree AdamW + Same cost as AdamW, but better eval perf
+ Full control over hyperparameters
PaLMForeachSFAdamW ForeachSFAdamW with PaLM's beta2 schedule + Same cost as AdamW, but better eval perf
+ Less control, but faster early and more stable late convergence
+ ScheduleFree
- slow early convergence
ForeachSOAP More efficient (speed, memory) SOAP + Faster convergence (loss-at-step)
+ Full control over hyperparameters
- more memory usage
- more hyperparameters
- higher overhead than AdamW (can be ammortized; better loss-at-second)
PaLMForeachSOAP ForeachSOAP with PaLM's beta2 schedule + Faster convergence (loss-at-step)
+ Less control, but faster early and more stable late convergence
- more memory usage
- more hyperparameters
- higher overhead than AdamW (can be ammortized; better loss-at-second)
SFPaLMForeachSOAP ScheduleFree PaLMForeachSOAP + Fast convergence (loss-at-step)
+ less memory usage than PaLMForeachSOAP (more tham AdamW)
- slower initial convergence than PaLMForeachSOAP (but allows higher LRs)
- higher overhead than AdamW (can be ammortized)
PrecondScheduleSFPaLMForeachSOAP SFPaLMForeachSOAP with preconditioner schedule, matching the error of PrecondEvery=2 with the cost of PrecondEvery=512 + Better initial convergence than SFPaLMForeachSOAP
+ Significantly faster (sec/it) later
+ less memory usage than PaLMForeachSOAP (more tham AdamW)
- slower initial convergence than PaLMForeachSOAP (but allows higher LRs)
- higher overhead than AdamW (can be ammortized), goes to 0 with increasing number of step
PrecondSchedulePaLMForeachSOAP PrecondScheduleSFPaLMForeachSOAP without schedule-free + Best initial convergence
+ Significantly faster (sec/it) later
+ high stability
- more memory usage than PrecondScheduleSFPaLMForeachSOAP
- higher overhead than AdamW (can be ammortized), goes to 0 with increasing number of steps
PrecondScheduleForeachSOAP PrecondScheduleSFPaLMForeachSOAP without PaLM's beta2 schedule + Better initial convergence
+ Significantly faster (sec/it) later
- more memory usage than PrecondScheduleSFPaLMForeachSOAP
- higher overhead than AdamW (can be ammortized), goes to 0 with increasing number of steps

Precond Schedule

The default preconditioner schedule (f) would yield the following update intervals:

Steps Interval, f Total (schedule) Total (constant, every 2) Total (constant, every 16)
10 1.00005 10 5 (0.5x) 0 (0.0x)
100 1.026 99 50 (0.5x) 6 (0.1x)
1,000 2.0 738 500 (0.7x) 62 (0.1x)
10,000 14.3 2,168 5,000 (2.3x) 625 (0.3x)
100,000 100.2 4,049 50,000 (12.3x) 6,250 (1.5x)
1,000,000 513 7,245 500,000 (69.0x) 62,500 (8.6x)

Utils

To access heavyball.utils, you need to explicitly import heavyball.utils.
It has several handy functions:

  • set_torch() sets pytorch optimization settings (TF32, opt_einsum, benchmark, ...)
  • compile_mode, a string passed as-is to torch.compile(mode=compile_mode) in all compiled heavyball calls
  • zeroth_power_mode, a string determining whether to use QR, newtonschulz{iterations}, or svd or eigh to approximate the eigenvectors. Eigh has the highest precision and cost

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heavyball-0.13.1.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

heavyball-0.13.1-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file heavyball-0.13.1.tar.gz.

File metadata

  • Download URL: heavyball-0.13.1.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for heavyball-0.13.1.tar.gz
Algorithm Hash digest
SHA256 ecb68b68d5e6745aac303d2931825629ab73139113d4c9641fd015bfb27eec8c
MD5 e18a9b9a1c28302bdd9ef0ef6ebd7b5d
BLAKE2b-256 3e2c60db719e5f1e6a092ae8149bb67bdaf02fdfd72a20f7bc5f1f692caed49e

See more details on using hashes here.

File details

Details for the file heavyball-0.13.1-py3-none-any.whl.

File metadata

  • Download URL: heavyball-0.13.1-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for heavyball-0.13.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e405ad2f367e1da8cc4f544b72a30f254f345fd5b7f8696839f967fadb2a127f
MD5 ba7da4965b36dcb244eb215afa6a6fc3
BLAKE2b-256 42cd5baef0388c8e53ba5d43195e3e81e5e5fed67d2a8f8cf52752ac0d6ce3a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page