Skip to main content

Pure-Python NVIDIA Warp backend for 3D Gaussian Splatting

Project description

gswarp

English · 中文

gswarp is a pure-Python NVIDIA Warp backend for 3D Gaussian Splatting, reimplementing the three core CUDA modules used in 3DGS training — the rasterizer, SSIM loss, and KNN initialization. No C++/CUDA compilation required; install via pip and swap out the original CUDA implementations directly.

License: Apache License 2.0. Third-party attributions in NOTICE.


Table of Contents


Three Replacement Modules

Module Replaces Import Path Notes
Rasterizer diff_gaussian_rasterization gswarp Full differentiable Gaussian rasterization + auto-tuning
SSIM fused_ssim gswarp.fused_ssim Separable Gaussian convolution with launch caching
KNN simple_knn gswarp.knn Morton-sort + bounding-box pruning 3-NN

Requirements

Component Minimum
Python 3.10+
NVIDIA GPU Compute capability ≥ 7.0 (Volta)
PyTorch 1.13+ (with CUDA support)
NVIDIA Warp 1.8.0+

Installation

pip install gswarp

This installs warp-lang automatically via package dependencies. If you want to pin the Warp version explicitly, use:

pip install "warp-lang>=1.12.0" gswarp

Or install from source:

git clone https://github.com/fancifulland2718/gswarp.git
cd gswarp
pip install .

No compilation steps are needed after installation. The first call to any Warp kernel triggers JIT compilation (a few seconds); subsequent runs use the cache.


Replacing CUDA Backends in a 3DGS Project

The examples below follow the gaussian-splatting reference implementation.

Rasterizer

Original (gaussian_renderer/__init__.py):

from diff_gaussian_rasterization import (
    GaussianRasterizationSettings,
    GaussianRasterizer,
)

Replace with:

from gswarp import (
    GaussianRasterizationSettings,
    GaussianRasterizer,
)

GaussianRasterizationSettings is a NamedTuple with the same fields as the original (Warp-specific fields have defaults). GaussianRasterizer.forward() returns additional outputs compared to the original:

# Original CUDA:
color, radii = rasterizer(means3D=..., means2D=..., ...)

# gswarp:
color, radii, depth, alpha, proj_2D, conic_2D, conic_2D_inv, \
    gs_per_pixel, weight_per_gs_pixel, x_mu = rasterizer(...)
# Extra outputs can be ignored when only color and radii are needed

Optional runtime configuration (call once before the training loop):

from gswarp import initialize_runtime_tuning, set_binning_sort_mode

# Detect GPU and select optimal block_dim automatically (recommended)
initialize_runtime_tuning(device="cuda:0", verbose=True)

# Choose a sort mode (default warp_depth_stable_tile is usually best)
set_binning_sort_mode("warp_depth_stable_tile")  # recommended for large scenes
# set_binning_sort_mode("warp_radix")            # alternative
# set_binning_sort_mode("torch")                 # fallback

SSIM

Original (train.py):

from fused_ssim import fused_ssim

Replace with:

from gswarp.fused_ssim import fused_ssim

The function signature is identical:

loss_ssim = fused_ssim(img1, img2, padding="same", train=True)

KNN

Original (scene/gaussian_model.py):

from simple_knn._C import distCUDA2

Replace with:

from gswarp.knn import distCUDA2

The function signature is identical:

dist2 = distCUDA2(points)  # points: (N, 3) float32 CUDA tensor

Recommended: Replace All at Once

Add the following near the top of train.py:

try:
    from gswarp import GaussianRasterizationSettings, GaussianRasterizer
    from gswarp.fused_ssim import fused_ssim
    from gswarp.knn import distCUDA2
    GSWARP_AVAILABLE = True
except ImportError:
    GSWARP_AVAILABLE = False

Then switch backends at each usage site using the GSWARP_AVAILABLE flag. A reference integration is available in gaussian-splatting/train.py.


Performance

Results from full 30K-step training on 12 standard 3DGS datasets. Hardware: RTX 5090D V2 (sm_120, 24 GiB), Python 3.14, PyTorch 2.11.0+cu130, Warp 1.12.0. All three modules use the Warp backend, with Python-layer overhead optimizations applied.

Dataset CUDA (it/s) Warp (it/s) Speedup
chair 103.6 113.1 ×1.09
drums 103.0 115.3 ×1.12
ficus 139.5 148.2 ×1.06
hotdog 144.5 156.5 ×1.08
lego 117.5 126.5 ×1.08
materials 134.0 144.9 ×1.08
mic 95.4 105.1 ×1.10
ship 107.0 113.2 ×1.06
train 55.6 58.3 ×1.05
truck 39.4 40.1 ×1.02
drjohnson 30.8 32.0 ×1.04
playroom 46.9 47.5 ×1.01

NeRF Synthetic (8 scenes): average ×1.08 speedup. Tanks & Temples / Deep Blending (4 large scenes): average ×1.03 speedup. The smaller gains on large scenes (drjohnson, playroom) are explained by the higher Gaussian counts diluting the rasterizer kernel advantage — see the rasterizer documentation for a per-phase breakdown.


Quality Metrics

Test-set evaluation after 30K training steps:

NeRF Synthetic (8-scene average)

Metric CUDA Warp Δ
PSNR (dB) 33.31 33.33 +0.02
SSIM 0.9692 0.9693 +0.0001
LPIPS 0.0303 0.0302 −0.0001

Tanks & Temples (2-scene average)

Metric CUDA Warp Δ
PSNR (dB) 23.74 23.79 +0.04
SSIM 0.8512 0.8515 +0.0003
LPIPS 0.1711 0.1707 −0.0004

Deep Blending (2-scene average)

Metric CUDA Warp Δ
PSNR (dB) 29.77 30.01 +0.04
SSIM 0.9062 0.9063 +0.0001
LPIPS 0.2390 0.2388 −0.0002

Per-scene PSNR differences are within ±0.25 dB; SSIM differences are < 0.001. The Warp backend produces training quality equivalent to the CUDA baseline across all tested scenes.


Detailed Documentation

Document Contents
docs/rasterizer.md Architecture, CUDA implementation differences, micro-benchmarks, correctness, known limitations
docs/ssim.md SSIM kernel optimizations, performance analysis, correctness
docs/knn.md KNN algorithm, Morton sorting, performance analysis

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gswarp-1.0.3.tar.gz (65.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gswarp-1.0.3-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file gswarp-1.0.3.tar.gz.

File metadata

  • Download URL: gswarp-1.0.3.tar.gz
  • Upload date:
  • Size: 65.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gswarp-1.0.3.tar.gz
Algorithm Hash digest
SHA256 b9efbcaa80c562d1a6dc5f7ec2801ed77a8d2b8203e7de5d2c54e7752cab5f42
MD5 d51bfdb578deb70cc96cf0eff46dff36
BLAKE2b-256 9e2b7fc09a838a0899997b532acfe2a6ac90559cf0918bbcb33d99e0db3dfa54

See more details on using hashes here.

Provenance

The following attestation bundles were made for gswarp-1.0.3.tar.gz:

Publisher: python-publish.yml on fancifulland2718/gswarp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gswarp-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: gswarp-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 62.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gswarp-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0a9a000c70b97eef44b249a2a294280fc197450b1c10a560ec45a102be3c3872
MD5 c32518e5cf6287fd3cd683c41a6dbd59
BLAKE2b-256 098f23600b1c93c0bd4d322dae01fafd214e4de2276d794628bfd0bd36e8765a

See more details on using hashes here.

Provenance

The following attestation bundles were made for gswarp-1.0.3-py3-none-any.whl:

Publisher: python-publish.yml on fancifulland2718/gswarp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page