Optimized CUDA implementation of differentiable SSIM for PyTorch
Project description
fussim
Fast CUDA SSIM for PyTorch. Based on MrNeRF/optimized-fused-ssim.
~7x faster than pytorch-msssim | FP16/AMP support | Drop-in replacement
Installation
pip install fussim
Pre-built wheels for PyTorch 2.5-2.9 and CUDA 11.8-12.8:
pip install fussim --extra-index-url https://opsiclear.github.io/fussim/whl/
Build from source
Requires CUDA Toolkit and C++ compiler.
git clone https://github.com/OpsiClear/fussim.git
cd fussim
pip install .
Quick Start
import torch
from fussim import fused_ssim
img1 = torch.rand(1, 3, 256, 256, device="cuda", requires_grad=True)
img2 = torch.rand(1, 3, 256, 256, device="cuda")
# Compute SSIM
ssim_value = fused_ssim(img1, img2)
# Use as loss
loss = 1.0 - fused_ssim(img1, img2)
loss.backward()
API
fused_ssim
from fussim import fused_ssim
fused_ssim(img1, img2, padding="same", train=True, window_size=11) -> Tensor
| Parameter | Type | Default | Description |
|---|---|---|---|
img1 |
Tensor | required | First image (B, C, H, W). Receives gradients. |
img2 |
Tensor | required | Second image (B, C, H, W). |
padding |
str | "same" |
"same" or "valid" |
train |
bool | True |
Enable gradient computation |
window_size |
int | 11 |
Gaussian window: 7, 9, or 11 |
Returns: Scalar mean SSIM value.
ssim (pytorch-msssim compatible)
from fussim import ssim
ssim(X, Y, data_range=255, size_average=True, win_size=11, K=(0.01, 0.03), nonnegative_ssim=False) -> Tensor
| Parameter | Type | Default | Description |
|---|---|---|---|
X |
Tensor | required | First image (B, C, H, W). Receives gradients. |
Y |
Tensor | required | Second image (B, C, H, W). |
data_range |
float | 255 |
Value range (255 for uint8, 1.0 for normalized) |
size_average |
bool | True |
Return scalar mean or per-batch values |
win_size |
int | 11 |
Gaussian window: 7, 9, or 11 |
K |
tuple | (0.01, 0.03) |
SSIM constants (K1, K2) |
nonnegative_ssim |
bool | False |
Clamp negative values to 0 |
SSIM Module
from fussim import SSIM
module = SSIM(data_range=1.0, size_average=True, win_size=11, K=(0.01, 0.03))
loss = 1 - module(pred, target)
loss.backward()
FP16 / AMP
with torch.autocast(device_type="cuda"):
ssim_value = fused_ssim(img1, img2) # Uses FP16 kernel automatically
Performance
RTX 4090, 5×5×1080×1920, 100 iterations:
| Implementation | Forward | Backward | Total | Speedup |
|---|---|---|---|---|
| pytorch_msssim | 28.7 ms | 28.9 ms | 57.5 ms | 1.0x |
| fussim | 4.38 ms | 4.66 ms | 9.04 ms | 6.4x |
Limitations
| Parameter | Constraint | Reason |
|---|---|---|
win_size |
7, 9, or 11 | CUDA kernel templates |
win_sigma |
1.5 (fixed) | Hardcoded in kernel |
win |
Not supported | Uses built-in Gaussian |
Attribution
| Project | Author |
|---|---|
| optimized-fused-ssim | Janusch Patas |
| fused-ssim | Rahul Goel |
Citation
@software{optimized-fused-ssim,
author = {Janusch Patas},
title = {Optimized Fused-SSIM},
year = {2025},
url = {https://github.com/MrNeRF/optimized-fused-ssim},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fussim-0.1.0.tar.gz
(35.9 kB
view details)
File details
Details for the file fussim-0.1.0.tar.gz.
File metadata
- Download URL: fussim-0.1.0.tar.gz
- Upload date:
- Size: 35.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab28ab1db154f1b29c7d28334393ebcfe7b648649666489a83473f9d48bd73af
|
|
| MD5 |
c931fcd887c63d8cbdd908f704de448f
|
|
| BLAKE2b-256 |
4c73e998fe00b77fc733934a3649ca574518339d9159c9ec4f60f5db012a57c1
|
Provenance
The following attestation bundles were made for fussim-0.1.0.tar.gz:
Publisher:
publish.yml on OpsiClear/fussim
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fussim-0.1.0.tar.gz -
Subject digest:
ab28ab1db154f1b29c7d28334393ebcfe7b648649666489a83473f9d48bd73af - Sigstore transparency entry: 782958245
- Sigstore integration time:
-
Permalink:
OpsiClear/fussim@38b6c0353b14df5049978297a1057f064a26d998 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/OpsiClear
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@38b6c0353b14df5049978297a1057f064a26d998 -
Trigger Event:
workflow_dispatch
-
Statement type: