Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization
Project description
SPS_safe
Safeguarded Polyak step sizes for non-smooth stochastic optimization — robust to vanishing gradients in deep networks.
SPS_safe is the official PyTorch implementation of the optimizers proposed in:
Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients Dimitris Oikonomou, Nicolas Loizou. ICML 2026.
The package provides two torch.optim.Optimizer subclasses — SPS_safe and IMA_SPS_safe — that adapt the Polyak step size for convex, Lipschitz, non-smooth objectives without requiring interpolation or oracle access to $f_i(x^*)$. Instead of capping the step size from above (as in SPS$_{\max}$), they place a single safeguard $M$ on the denominator $|g_i^t|^2$, which (i) prevents step-size blow-up when subgradients vanish, (ii) keeps the rule genuinely adaptive — never collapsing to a constant update — and (iii) admits an equivalent interpretation as an adaptive clipped subgradient method (Proposition 2.1 of the paper).
Table of contents
Installation
From source:
git clone https://github.com/dimitris-oik/sps_safe.git
cd sps_safe
pip install -e .
Requirements: torch, numpy, scipy (only for the numpy experiments), Python 3.8+.
Quick start
Both optimizers need the (mini-batch) loss value inside .step(). The only change to a standard PyTorch training loop is passing loss= to .step():
import torch
from sps_safe import SPS_safe
model = MyModel()
criterion = torch.nn.CrossEntropyLoss()
optimizer = SPS_safe(model.parameters(), ell_star=0.0, M=1.0, weight_decay=5e-4)
for x, y in loader:
optimizer.zero_grad()
loss = criterion(model(x), y)
loss.backward()
optimizer.step(loss=loss) # <-- the only change vs. SGD/Adam
For the momentum variant (Stochastic Heavy Ball / Iterate Moving Average with the safeguarded Polyak step):
from sps_safe import IMA_SPS_safe
optimizer = IMA_SPS_safe(
model.parameters(),
ell_star=0.0,
lambd=1.0, # IMA averaging param; equivalent SHB momentum is beta = lambd / (1 + lambd)
M=1.0, # set M <= 0 to auto-initialise from the first ||g||^2
weight_decay=0.0,
)
The algorithm
Given a mini-batch loss $f_i$ with stochastic subgradient $g_i^t \in \partial f_i(x^t)$, SPS_safe sets
$$\gamma_t = \frac{f_i(x^t) - \ell_i^*}{\max{|g_i^t|^2,\ M}}, \qquad x^{t+1} = x^t - \gamma_t g_i^t.$$
The single hyper-parameter $M > 0$ replaces both the clip $\gamma_b$ of SPS$_{\max}$ and the oracle values $f_i(x^)$ required by SPS$^$ / IMA-SPS.
For momentum, IMA-SPS_safe uses the Iterate Moving Average form of Stochastic Heavy Ball (Sebbouh et al., 2021) with a safeguarded Polyak step on $z$:
$$\eta_t = \frac{\big[f_i(x^t) - \ell_i^* + \lambda_t \langle g_i^t,\ x^t - x^{t-1}\rangle\big]_+}{\max{|g_i^t|^2,\ M}},$$
$$z^{t+1} = z^t - \eta_t g_i^t, \qquad x^{t+1} = \frac{\lambda x^t + z^{t+1}}{\lambda + 1}.$$
The averaging parameter $\lambda \ge 0$ corresponds to SHB momentum $\beta = \lambda / (1 + \lambda)$.
Theoretical guarantees
| Setting | Method | Rate (Cesàro / last iterate) |
|---|---|---|
| Convex, Lipschitz, non-smooth | SPS_safe |
$O(T^{-1/2})$ to a neighborhood (Theorem 3.1) |
| Convex, Lipschitz + momentum | IMA_SPS_safe |
$O(T^{-1/2})$ Cesàro and last-iterate (Theorems 3.4, 3.5) |
| Interpolated ($\sigma^2 = 0$) | both | neighborhood collapses; exact convergence |
All results hold without the interpolation assumption and without oracle access to $f_i(x^*)$ — both of which were required by prior Polyak-type step sizes in the non-smooth setting (Loizou et al., 2021; Garrigos et al., 2023; Gower et al., 2025).
API reference
SPS_safe(params, ell_star=0.0, M=1.0, weight_decay=0.0)
| Argument | Type | Default | Description |
|---|---|---|---|
params |
iterable | — | Parameters to optimize. |
ell_star |
float | 0.0 |
Lower bound $\ell_i^*$ on the mini-batch loss. Typically 0.0 for non-negative losses. |
M |
float | 1.0 |
Safeguard threshold on $|g_i^t|^2$ in the denominator. |
weight_decay |
float | 0.0 |
L2 weight-decay coefficient. |
IMA_SPS_safe(params, ell_star=0.0, lambd=1.0, M=1.0, weight_decay=0.0)
| Argument | Type | Default | Description |
|---|---|---|---|
params |
iterable | — | Parameters to optimize. |
ell_star |
float | 0.0 |
Lower bound $\ell_i^*$ on the mini-batch loss. |
lambd |
float | 1.0 |
IMA averaging parameter $\lambda \ge 0$. Equivalent SHB momentum is $\beta = \lambda / (1 + \lambda)$. |
M |
float | 1.0 |
Safeguard threshold on $|g_i^t|^2$. Pass a value $\le 0$ to auto-initialise from the first $|g_i^t|^2$. |
weight_decay |
float | 0.0 |
L2 weight-decay coefficient. |
Both classes expose the standard torch.optim.Optimizer interface. The only non-standard requirement is that .step() is called as optimizer.step(loss=loss).
Experiments
The numpy_exps/ directory reproduces the §4.1 plots that empirically verify the convergence guarantees on non-smooth convex benchmarks:
numpy_exps/loss.py—SVM(hinge loss) andPhaseRetrievalobjectives.numpy_exps/methods.py— every step-size selection compared in the paper:sgd,sgd_sps_max(Loizou et al., 2021),sgd_sps_plus(Garrigos et al., 2023),sgd_sps_m(SPS_safe), and the IMA variantsima,ima_sps_plus,ima_sps_m, plus the last-iterate variants used in Theorem 3.5.numpy_exps/exps_svm.ipynb— SVM figures.numpy_exps/exps_phase.ipynb— Phase retrieval figures.
For the deep-learning experiments (ResNet-20 / ResNet-32 on CIFAR-10 / CIFAR-100), the full hyper-parameter protocol is described in §4.2 and Appendix C.1 of the paper. The training script is intentionally not shipped — SPS_safe and IMA_SPS_safe are drop-in torch.optim.Optimizers and integrate with any standard PyTorch training loop (see Quick start).
The full paper (theory + extended experiments) is checked into this repository as paper.md.
Citation
If you use this code or build on the method, please cite:
@inproceedings{oikonomou2026safeguarded,
title = {Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients},
author = {Oikonomou, Dimitris and Loizou, Nicolas},
booktitle = {ICML},
year = {2026},
}
License
Released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sps_safe-1.0.0.tar.gz.
File metadata
- Download URL: sps_safe-1.0.0.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae4a54f58792a1737650b3fe8fd086a2076dbc856cb6d16871819e6cf7b49365
|
|
| MD5 |
3ddd45673c7a563874706bc84e25478a
|
|
| BLAKE2b-256 |
29ad62cdbf17067dfef0b370b4b88b2ff7d21051298bdd1cbaae065db4c459fd
|
File details
Details for the file sps_safe-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sps_safe-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fd3827ea8cc8c61ad0a2fbca1855372b880e76fb3f53630a1e6a7c90e952a80
|
|
| MD5 |
822ff273366c0892666c1bb4968dc79c
|
|
| BLAKE2b-256 |
70fc63d3e6329bb211bae52211ea72a2d93136d133c314e02b951760e134fc88
|
File details
Details for the file sps_safe-1.0-py3-none-any.whl.
File metadata
- Download URL: sps_safe-1.0-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f8531be68bca86192ac6f12000cbc8227c536fe71a11cbfdf45d0d1118fc646
|
|
| MD5 |
6a07d8243ac4caaf21f5fb0b4ec0fa67
|
|
| BLAKE2b-256 |
86a96045210f49e9e7f363d41cb73e57ca2d487f56ae096323e67c63715a9f3c
|