Skip to main content

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization

Project description

SPS_safe

Safeguarded Polyak step sizes for non-smooth stochastic optimization — robust to vanishing gradients in deep networks.

arXiv License: MIT Python 3.8+

SPS_safe is the official PyTorch implementation of the optimizers proposed in:

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients Dimitris Oikonomou, Nicolas Loizou. ICML 2026.

The package provides two torch.optim.Optimizer subclasses — SPS_safe and IMA_SPS_safe — that adapt the Polyak step size for convex, Lipschitz, non-smooth objectives without requiring interpolation or oracle access to $f_i(x^*)$. Instead of capping the step size from above (as in SPS$_{\max}$), they place a single safeguard $M$ on the denominator $|g_i^t|^2$, which (i) prevents step-size blow-up when subgradients vanish, (ii) keeps the rule genuinely adaptive — never collapsing to a constant update — and (iii) admits an equivalent interpretation as an adaptive clipped subgradient method (Proposition 2.1 of the paper).


Table of contents


Installation

From source:

git clone https://github.com/dimitris-oik/sps_safe.git
cd sps_safe
pip install -e .

Requirements: torch, numpy, scipy (only for the numpy experiments), Python 3.8+.


Quick start

Both optimizers need the (mini-batch) loss value inside .step(). The only change to a standard PyTorch training loop is passing loss= to .step():

import torch
from sps_safe import SPS_safe

model     = MyModel()
criterion = torch.nn.CrossEntropyLoss()
optimizer = SPS_safe(model.parameters(), ell_star=0.0, M=1.0, weight_decay=5e-4)

for x, y in loader:
    optimizer.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    optimizer.step(loss=loss)               # <-- the only change vs. SGD/Adam

For the momentum variant (Stochastic Heavy Ball / Iterate Moving Average with the safeguarded Polyak step):

from sps_safe import IMA_SPS_safe

optimizer = IMA_SPS_safe(
    model.parameters(),
    ell_star=0.0,
    lambd=1.0,           # IMA averaging param; equivalent SHB momentum is beta = lambd / (1 + lambd)
    M=1.0,               # set M <= 0 to auto-initialise from the first ||g||^2
    weight_decay=0.0,
)

The algorithm

Given a mini-batch loss $f_i$ with stochastic subgradient $g_i^t \in \partial f_i(x^t)$, SPS_safe sets

$$\gamma_t = \frac{f_i(x^t) - \ell_i^*}{\max{|g_i^t|^2,\ M}}, \qquad x^{t+1} = x^t - \gamma_t g_i^t.$$

The single hyper-parameter $M > 0$ replaces both the clip $\gamma_b$ of SPS$_{\max}$ and the oracle values $f_i(x^)$ required by SPS$^$ / IMA-SPS.

For momentum, IMA-SPS_safe uses the Iterate Moving Average form of Stochastic Heavy Ball (Sebbouh et al., 2021) with a safeguarded Polyak step on $z$:

$$\eta_t = \frac{\big[f_i(x^t) - \ell_i^* + \lambda_t \langle g_i^t,\ x^t - x^{t-1}\rangle\big]_+}{\max{|g_i^t|^2,\ M}},$$

$$z^{t+1} = z^t - \eta_t g_i^t, \qquad x^{t+1} = \frac{\lambda x^t + z^{t+1}}{\lambda + 1}.$$

The averaging parameter $\lambda \ge 0$ corresponds to SHB momentum $\beta = \lambda / (1 + \lambda)$.

Theoretical guarantees

Setting Method Rate (Cesàro / last iterate)
Convex, Lipschitz, non-smooth SPS_safe $O(T^{-1/2})$ to a neighborhood (Theorem 3.1)
Convex, Lipschitz + momentum IMA_SPS_safe $O(T^{-1/2})$ Cesàro and last-iterate (Theorems 3.4, 3.5)
Interpolated ($\sigma^2 = 0$) both neighborhood collapses; exact convergence

All results hold without the interpolation assumption and without oracle access to $f_i(x^*)$ — both of which were required by prior Polyak-type step sizes in the non-smooth setting (Loizou et al., 2021; Garrigos et al., 2023; Gower et al., 2025).


API reference

SPS_safe(params, ell_star=0.0, M=1.0, weight_decay=0.0)

Argument Type Default Description
params iterable Parameters to optimize.
ell_star float 0.0 Lower bound $\ell_i^*$ on the mini-batch loss. Typically 0.0 for non-negative losses.
M float 1.0 Safeguard threshold on $|g_i^t|^2$ in the denominator.
weight_decay float 0.0 L2 weight-decay coefficient.

IMA_SPS_safe(params, ell_star=0.0, lambd=1.0, M=1.0, weight_decay=0.0)

Argument Type Default Description
params iterable Parameters to optimize.
ell_star float 0.0 Lower bound $\ell_i^*$ on the mini-batch loss.
lambd float 1.0 IMA averaging parameter $\lambda \ge 0$. Equivalent SHB momentum is $\beta = \lambda / (1 + \lambda)$.
M float 1.0 Safeguard threshold on $|g_i^t|^2$. Pass a value $\le 0$ to auto-initialise from the first $|g_i^t|^2$.
weight_decay float 0.0 L2 weight-decay coefficient.

Both classes expose the standard torch.optim.Optimizer interface. The only non-standard requirement is that .step() is called as optimizer.step(loss=loss).


Experiments

The numpy_exps/ directory reproduces the §4.1 plots that empirically verify the convergence guarantees on non-smooth convex benchmarks:

For the deep-learning experiments (ResNet-20 / ResNet-32 on CIFAR-10 / CIFAR-100), the full hyper-parameter protocol is described in §4.2 and Appendix C.1 of the paper. The training script is intentionally not shipped — SPS_safe and IMA_SPS_safe are drop-in torch.optim.Optimizers and integrate with any standard PyTorch training loop (see Quick start).

The full paper (theory + extended experiments) is checked into this repository as paper.md.


Citation

If you use this code or build on the method, please cite:

@inproceedings{oikonomou2026safeguarded,
  title = {Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients},
  author = {Oikonomou, Dimitris and Loizou, Nicolas},
  booktitle = {ICML},
  year = {2026},
}

License

Released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sps_safe-1.0.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sps_safe-1.0.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

sps_safe-1.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file sps_safe-1.0.0.tar.gz.

File metadata

  • Download URL: sps_safe-1.0.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ae4a54f58792a1737650b3fe8fd086a2076dbc856cb6d16871819e6cf7b49365
MD5 3ddd45673c7a563874706bc84e25478a
BLAKE2b-256 29ad62cdbf17067dfef0b370b4b88b2ff7d21051298bdd1cbaae065db4c459fd

See more details on using hashes here.

File details

Details for the file sps_safe-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sps_safe-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6fd3827ea8cc8c61ad0a2fbca1855372b880e76fb3f53630a1e6a7c90e952a80
MD5 822ff273366c0892666c1bb4968dc79c
BLAKE2b-256 70fc63d3e6329bb211bae52211ea72a2d93136d133c314e02b951760e134fc88

See more details on using hashes here.

File details

Details for the file sps_safe-1.0-py3-none-any.whl.

File metadata

  • Download URL: sps_safe-1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f8531be68bca86192ac6f12000cbc8227c536fe71a11cbfdf45d0d1118fc646
MD5 6a07d8243ac4caaf21f5fb0b4ec0fa67
BLAKE2b-256 86a96045210f49e9e7f363d41cb73e57ca2d487f56ae096323e67c63715a9f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page