Skip to main content

CUDA global sort + CUDA-GL interop + geometry shader quad emitting + hardware rasterization + CUDA-GL interop = fast gaussian splatting

Project description

Fast Gaussian Rasterization

  • Can be 5-10x faster than the original software CUDA rasterizer (diff-gaussian-rasterization).
  • Can be 2-3x faster if using offline rendering. (Bottleneck: copying rendered images around, thinking about improvements.)
  • Speedup most visible with high pixel-to-point ratio (large gaussians, small point count, high-res rendering).

https://github.com/dendenxu/fast-gaussian-splatting/assets/43734697/f50afd6f-bbd5-4e18-aca6-a7356a5d3f75

No backward pass is supported yet. Will think of ways to add a backward. Depth-peeling (4K4D) is too slow. Discussion welcomed.

Installation

No CUDA compilation is required.

pip install fast_gauss

Usage

Replace the original import of diff_gaussian_rasterization with fast_gauss.

For example, replace this:

from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer

with this:

from fast_gauss import GaussianRasterizationSettings, GaussianRasterizer

And you're good to go.

Tips

Note: for the ultimate 5-10x performance increase, you'll need to let fast_gauss's shader directly write to your desired framebuffer.

Currently, we are trying to automatically detect whether you're managing your own OpenGL context (i.e. opening up a GUI) by checking for the module OpenGL during the import of fast_gauss.

If detected, all rendering command will return Nones and we will directly write to the bound framebuffer at the time of the draw call.

Thus if you're running in a GUI (OpenGL-based) environment, the output of our rasterizer will be Nones and does not require further processing.

  • TODO: Improve offline rendering performance.
  • TODO: Add a warning to the user if they're performing further processing on the returned values.

Note: the speedup is mostly visible when the pixel-to-point ratio is high.

That is, when there're large gaussians and very high resolution rendering, the speedup is more visible.

The CUDA-based software implementation is more resolution sensitive and for some extremely dense point clouds (> 1 million points), the CUDA implementation might be faster.

This is because the typical rasterization-based pipeline on modern graphics are not well-optimized for small triangles.

Note: it's recommended to pass in a CPU tensor in the GaussianRasterizationSettings to avoid explicit synchronizations for even better performance.

  • TODO: Add a warning to the user if GPU tensors are detected.

Note: the second output of the GaussianRasterizer is not radii anymore (since we're not gonna use it for the backward pass), but the alpha values of the rendered image instead.

And the alpha channel content seems to be bugged currently, will debug.

  • TODO: Debug alpha channel

TODOs

  • TODO: Apply more of the optimization techniques used by similar shaders, including packing the data into a texture and bit reduction during computation.
  • TODO: Thinks of ways for a backward pass. Welcome to discuss!
  • TODO: Compute covariance from scaling and rotation in the shader, currently it's on the CUDA side.
  • TODO: Compute SH in the shader, currently it's on the CUDA side.

Environment

This project requires you to have an NVIDIA GPU with the ability to interop between CUDA and OpenGL. Thus, WSL is not supported and OSX (MacOS) is not supported.

For offline rendering (the drop-in replacement of the original CUDA rasterizer), we also need a valid EGL environment. It can sometimes be hard to set up for virtualized machines. Potential fix.

  • TODO: Test on more platforms.

Credits

Inspired by those insanely fast WebGL-based 3DGS viewers:

Using the algorithm and improvements from:

CUDA-GL interop & EGL environment inspired by:

  • 4K4D where they(I) used the interop for depth-peeling.
  • EasyVolcap for the collection of utilities, including EGL setup.
  • nvdiffrast for their EGL context setup and CUDA-GL interop setup.

Citation

@misc{fast_gauss,  
    title = {Fast Gaussian Splatting},
    howpublished = {GitHub},  
    year = {2024},
    url = {https://github.com/dendenxu/fast-gaussian-rasterization}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_gauss-0.0.5.tar.gz (36.1 kB view hashes)

Uploaded Source

Built Distribution

fast_gauss-0.0.5-py3-none-any.whl (37.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page