Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SPS_safe

Safeguarded Polyak step sizes for non-smooth stochastic optimization — robust to vanishing gradients in deep networks.

SPS_safe is the official PyTorch implementation of the optimizers proposed in:

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients Dimitris Oikonomou, Nicolas Loizou. ICML 2026.

The package provides two torch.optim.Optimizer subclasses — SPS_safe and IMA_SPS_safe — that adapt the Polyak step size for convex, Lipschitz, non-smooth objectives without requiring interpolation or oracle access to $f_i(x^*)$. Instead of capping the step size from above (as in SPS$_{\max}$), they place a single safeguard $M$ on the denominator $|g_i^t|^2$, which (i) prevents step-size blow-up when subgradients vanish, (ii) keeps the rule genuinely adaptive — never collapsing to a constant update — and (iii) admits an equivalent interpretation as an adaptive clipped subgradient method (Proposition 2.1 of the paper).

Installation
Quick start
The algorithm
API reference
Experiments
Citation
License

Installation

From source:

git clone https://github.com/dimitris-oik/sps_safe.git
cd sps_safe
pip install -e .

Requirements: torch, numpy, scipy (only for the numpy experiments), Python 3.8+.

Quick start

Both optimizers need the (mini-batch) loss value inside .step(). The only change to a standard PyTorch training loop is passing loss= to .step():

import torch
from sps_safe import SPS_safe

model     = MyModel()
criterion = torch.nn.CrossEntropyLoss()
optimizer = SPS_safe(model.parameters(), ell_star=0.0, M=1.0, weight_decay=5e-4)

for x, y in loader:
    optimizer.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    optimizer.step(loss=loss)               # <-- the only change vs. SGD/Adam

For the momentum variant (Stochastic Heavy Ball / Iterate Moving Average with the safeguarded Polyak step):

from sps_safe import IMA_SPS_safe

optimizer = IMA_SPS_safe(
    model.parameters(),
    ell_star=0.0,
    lambd=1.0,           # IMA averaging param; equivalent SHB momentum is beta = lambd / (1 + lambd)
    M=1.0,               # set M <= 0 to auto-initialise from the first ||g||^2
    weight_decay=0.0,
)

The algorithm

Given a mini-batch loss $f_i$ with stochastic subgradient $g_i^t \in \partial f_i(x^t)$, SPS_safe sets

$$\gamma_t = \frac{f_i(x^t) - \ell_i^*}{\max{|g_i^t|^2,\ M}}, \qquad x^{t+1} = x^t - \gamma_t g_i^t.$$

The single hyper-parameter $M > 0$ replaces both the clip $\gamma_b$ of SPS$_{\max}$ and the oracle values $f_i(x^)$ required by SPS$^$ / IMA-SPS.

For momentum, IMA-SPS_safe uses the Iterate Moving Average form of Stochastic Heavy Ball (Sebbouh et al., 2021) with a safeguarded Polyak step on $z$:

$$\eta_t = \frac{\big[f_i(x^t) - \ell_i^* + \lambda_t \langle g_i^t,\ x^t - x^{t-1}\rangle\big]_+}{\max{|g_i^t|^2,\ M}},$$

$$z^{t+1} = z^t - \eta_t g_i^t, \qquad x^{t+1} = \frac{\lambda x^t + z^{t+1}}{\lambda + 1}.$$

The averaging parameter $\lambda \ge 0$ corresponds to SHB momentum $\beta = \lambda / (1 + \lambda)$.

Theoretical guarantees

Setting	Method	Rate (Cesàro / last iterate)
Convex, Lipschitz, non-smooth	`SPS_safe`	$O(T^{-1/2})$ to a neighborhood (Theorem 3.1)
Convex, Lipschitz + momentum	`IMA_SPS_safe`	$O(T^{-1/2})$ Cesàro and last-iterate (Theorems 3.4, 3.5)
Interpolated ($\sigma^2 = 0$)	both	neighborhood collapses; exact convergence

All results hold without the interpolation assumption and without oracle access to $f_i(x^*)$ — both of which were required by prior Polyak-type step sizes in the non-smooth setting (Loizou et al., 2021; Garrigos et al., 2023; Gower et al., 2025).

API reference

`SPS_safe(params, ell_star=0.0, M=1.0, weight_decay=0.0)`

Argument	Type	Default	Description
`params`	iterable	—	Parameters to optimize.
`ell_star`	float	`0.0`	Lower bound $\ell_i^*$ on the mini-batch loss. Typically `0.0` for non-negative losses.
`M`	float	`1.0`	Safeguard threshold on $\|g_i^t\|^2$ in the denominator.
`weight_decay`	float	`0.0`	L2 weight-decay coefficient.

`IMA_SPS_safe(params, ell_star=0.0, lambd=1.0, M=1.0, weight_decay=0.0)`

Argument	Type	Default	Description
`params`	iterable	—	Parameters to optimize.
`ell_star`	float	`0.0`	Lower bound $\ell_i^*$ on the mini-batch loss.
`lambd`	float	`1.0`	IMA averaging parameter $\lambda \ge 0$. Equivalent SHB momentum is $\beta = \lambda / (1 + \lambda)$.
`M`	float	`1.0`	Safeguard threshold on $\|g_i^t\|^2$. Pass a value $\le 0$ to auto-initialise from the first $\|g_i^t\|^2$.
`weight_decay`	float	`0.0`	L2 weight-decay coefficient.

Both classes expose the standard torch.optim.Optimizer interface. The only non-standard requirement is that .step() is called as optimizer.step(loss=loss).

Experiments

The numpy_exps/ directory reproduces the §4.1 plots that empirically verify the convergence guarantees on non-smooth convex benchmarks:

numpy_exps/loss.py — SVM (hinge loss) and PhaseRetrieval objectives.
numpy_exps/methods.py — every step-size selection compared in the paper: sgd, sgd_sps_max (Loizou et al., 2021), sgd_sps_plus (Garrigos et al., 2023), sgd_sps_m (SPS_safe), and the IMA variants ima, ima_sps_plus, ima_sps_m, plus the last-iterate variants used in Theorem 3.5.
numpy_exps/exps_svm.ipynb — SVM figures.
numpy_exps/exps_phase.ipynb — Phase retrieval figures.

For the deep-learning experiments (ResNet-20 / ResNet-32 on CIFAR-10 / CIFAR-100), the full hyper-parameter protocol is described in §4.2 and Appendix C.1 of the paper. The training script is intentionally not shipped — SPS_safe and IMA_SPS_safe are drop-in torch.optim.Optimizers and integrate with any standard PyTorch training loop (see Quick start).

The full paper (theory + extended experiments) is checked into this repository as paper.md.

Citation

If you use this code or build on the method, please cite:

@inproceedings{oikonomou2026safeguarded,
  title = {Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients},
  author = {Oikonomou, Dimitris and Loizou, Nicolas},
  booktitle = {ICML},
  year = {2026},
}

License

Released under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sps_safe-1.0.0.tar.gz (6.8 kB view details)

Uploaded May 31, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sps_safe-1.0.0-py3-none-any.whl (7.7 kB view details)

Uploaded May 31, 2026 Python 3

sps_safe-1.0-py3-none-any.whl (7.6 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file sps_safe-1.0.0.tar.gz.

File metadata

Download URL: sps_safe-1.0.0.tar.gz
Upload date: May 31, 2026
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ae4a54f58792a1737650b3fe8fd086a2076dbc856cb6d16871819e6cf7b49365`
MD5	`3ddd45673c7a563874706bc84e25478a`
BLAKE2b-256	`29ad62cdbf17067dfef0b370b4b88b2ff7d21051298bdd1cbaae065db4c459fd`

See more details on using hashes here.

File details

Details for the file sps_safe-1.0.0-py3-none-any.whl.

File metadata

Download URL: sps_safe-1.0.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 7.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fd3827ea8cc8c61ad0a2fbca1855372b880e76fb3f53630a1e6a7c90e952a80`
MD5	`822ff273366c0892666c1bb4968dc79c`
BLAKE2b-256	`70fc63d3e6329bb211bae52211ea72a2d93136d133c314e02b951760e134fc88`

See more details on using hashes here.

File details

Details for the file sps_safe-1.0-py3-none-any.whl.

File metadata

Download URL: sps_safe-1.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 7.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sps_safe-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f8531be68bca86192ac6f12000cbc8227c536fe71a11cbfdf45d0d1118fc646`
MD5	`6a07d8243ac4caaf21f5fb0b4ec0fa67`
BLAKE2b-256	`86a96045210f49e9e7f363d41cb73e57ca2d487f56ae096323e67c63715a9f3c`

See more details on using hashes here.

sps-safe 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SPS_safe

Table of contents

Installation

Quick start

The algorithm

Theoretical guarantees

API reference

SPS_safe(params, ell_star=0.0, M=1.0, weight_decay=0.0)

IMA_SPS_safe(params, ell_star=0.0, lambd=1.0, M=1.0, weight_decay=0.0)

Experiments

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

`SPS_safe(params, ell_star=0.0, M=1.0, weight_decay=0.0)`

`IMA_SPS_safe(params, ell_star=0.0, lambd=1.0, M=1.0, weight_decay=0.0)`