Metal-accelerated 3D Gaussian Splatting for Apple Silicon
Project description
msplat
A 3D Gaussian Splatting training engine for Apple Silicon, built entirely on Metal. No external dependencies beyond system frameworks.
The entire training pipeline: projection, sorting, rasterization, SSIM loss, backward pass, Adam optimizer, and densification runs as fused Metal compute shaders.
The result is a self-contained engine that trains a full-resolution Mip-NeRF 360 scene in ~90 seconds and renders it at ~350 FPS on an M4 Max.
Python and Swift bindings are provided, as well as a standalone C++ CLI.
Why this exists
The original 3D Gaussian Splatting implementation is CUDA-only. Ports to other frameworks (gsplat, taichi-3dgs, etc.) still depend on PyTorch for autograd, optimizer state, and tensor management. This means ~2GB of framework overhead, Python GIL contention, and no straightforward path to native macOS/iOS integration.
Architecture
core/metal/msplat_metal.metal ← Compute kernels
core/src/ ← C++ training loop, dataset loaders, SSIM eval
core/include/ ← MTensor (lightweight GPU tensor), Model, API headers
python/bindings.cpp ← nanobind Python module
swift/Sources/Msplat/ ← Swift package (via C API bridge)
cli/msplat.cpp ← C++ CLI
Training pipeline (single iteration)
Each training step dispatches all work into one Metal command encoder:
Forward:
project_and_sh_forward ← fused 3D→2D projection + spherical harmonics
prefix_sum + scatter ← gaussian→tile intersection mapping
bitonic_sort_per_tile ← tile-local depth sort + inline data packing
nd_rasterize_forward ← per-pixel alpha compositing (16x16 tiles)
ssim_h_fwd + ssim_v_fwd ← separable 11-tap SSIM + L1 loss
Backward:
ssim_h_bwd + ssim_v_bwd ← separable SSIM gradient
rasterize_backward ← per-pixel backward compositing
project_and_sh_backward ← fused projection + SH VJP
fused_adam (×6 groups) ← optimizer step (means, scales, quats, opacity, SH)
accumulate_grad_stats ← gradient norms for densification
Key design decisions
Tile-local bitonic sort instead of global radix sort. Each 16x16 tile independently sorts its gaussians (up to 2048) in threadgroup shared memory. The sort kernel also packs per-gaussian data (xy, opacity, conic, color) inline, eliminating a separate scatter dispatch.
GPU-resident densification. The split/clone/cull cycle never leaves the GPU. Classification, growth, and compaction are all compute kernels operating on device buffers. No CPU readback of gradient statistics or gaussian counts.
Fused kernels. Projection and spherical harmonic evaluation share registers (avoid a device memory round-trip for world-space position). The backward pass recomputes 3D covariance from scales/quaternions on-the-fly rather than storing it. Adam optimizer updates all six parameter groups in fused dispatches.
Separable SSIM. The 11x11 Gaussian-weighted SSIM window decomposes into two 1D passes (horizontal then vertical), reducing per-pixel work from 121 to 22 multiply-adds. Forward and backward each take two kernels, using threadgroup shared memory for the intermediate statistics.
Depth-chunked rasterization. For tiles with extreme gaussian counts, the forward pass splits into 512-gaussian chunks with a merge kernel that reconstructs absolute transmittance. The backward pass uses precomputed prefix/suffix transmittance to avoid re-traversal.
Installation & Usage
Python
pip install msplat
import msplat
dataset = msplat.load_dataset("path/to/colmap/", eval_mode=True)
config = msplat.TrainingConfig(iterations=7000, num_downscales=0)
trainer = msplat.GaussianTrainer(dataset, config)
trainer.train(lambda s: print(f"step={s.iteration} splats={s.splat_count:,}"),
callback_every=100)
trainer.export_ply("output.ply")
trainer.save_checkpoint("checkpoint.msplat") # save/resume training
metrics = trainer.evaluate()
print(f"PSNR: {metrics['psnr']:.2f} SSIM: {metrics['ssim']:.3f}")
# Render from arbitrary viewpoints
pose = dataset.camera_pose(0) # (4, 4) cam-to-world matrix
img = trainer.render_from_pose(pose) # numpy (H, W, 3) float32
Supported dataset formats: COLMAP, Nerfstudio, Polycam.
Type stubs (_core.pyi) are included for IDE autocompletion.
CLI
pip install msplat[cli]
msplat-train path/to/dataset -n 7000 --eval
Swift
Requires Xcode and CMake (brew install cmake).
// Package.swift
dependencies: [
.package(url: "https://github.com/rayanht/msplat.git", from: "1.1.0")
]
Build the XCFramework (one-time, from repo root):
./scripts/build-xcframework.sh
import Msplat
let dataset = GaussianDataset(path: "path/to/colmap/", downscaleFactor: 4.0)
let trainer = GaussianTrainer(dataset: dataset)
for _ in 0..<1000 {
let stats = trainer.step()
print("step=\(stats.iteration) splats=\(stats.splatCount)")
}
trainer.exportPly(to: "output.ply")
// Render from arbitrary viewpoints
let pose = dataset.cameraPose(at: 0) // [Float] cam-to-world matrix
let img = trainer.renderFromPose(camToWorld: pose)
C++ CLI
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
./build/msplat path/to/dataset -n 7000 --eval
Build from source
git clone https://github.com/rayanht/msplat.git && cd msplat
# Python
pip install -e .
# C++ CLI + static lib
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j
# Swift XCFramework
./scripts/build-xcframework.sh
cd swift && swift build
Requires macOS 14+, Apple Silicon. No external dependencies.
Benchmarks
mipnerf360, M4 Max. msplat runs 7K iterations with no downscales:
msplat-train path/to/scene -n 7000 --num-downscales 0 --eval
| Scene | msplat PSNR | msplat SSIM | msplat wall time | gsplat PSNR | gsplat SSIM | gsplat wall time |
|---|---|---|---|---|---|---|
| bicycle | 23.21 | 0.605 | 82s | 23.71 | 0.668 | ~335s |
| counter | 27.44 | 0.881 | 91s | 27.14 | 0.878 | ~335s |
| garden | 25.76 | 0.786 | 107s | 26.30 | 0.833 | ~335s |
| room | 30.21 | 0.898 | 85s | 29.21 | 0.893 | ~335s |
30K iterations (garden)
msplat-train path/to/garden -n 30000 --num-downscales 0 --eval
| msplat | gsplat | |
|---|---|---|
| PSNR | 27.17 | 27.32 |
| SSIM | 0.854 | 0.865 |
| Gaussians | 3.55M | — |
| Wall time | 1039s | ~2149s |
gsplat numbers from docs.gsplat.studio (TITAN RTX). gsplat wall times are the reported average across all mipnerf360 scenes (per-scene times not published).
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msplat-1.1.2-cp313-cp313-macosx_14_0_arm64.whl.
File metadata
- Download URL: msplat-1.1.2-cp313-cp313-macosx_14_0_arm64.whl
- Upload date:
- Size: 650.0 kB
- Tags: CPython 3.13, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a065d79f793f85fad430a016e33d5adc0c13208601cb5aaf13ea460c2c859e60
|
|
| MD5 |
dacb1f1f0a9b026e0e993d849ac0b0d3
|
|
| BLAKE2b-256 |
829c7bd8fe116637608e14f012d069c150bc4528bede97eaaa9ba0c887b8e777
|
Provenance
The following attestation bundles were made for msplat-1.1.2-cp313-cp313-macosx_14_0_arm64.whl:
Publisher:
release.yml on rayanht/msplat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msplat-1.1.2-cp313-cp313-macosx_14_0_arm64.whl -
Subject digest:
a065d79f793f85fad430a016e33d5adc0c13208601cb5aaf13ea460c2c859e60 - Sigstore transparency entry: 1042650161
- Sigstore integration time:
-
Permalink:
rayanht/msplat@517a40cccb2ff3a9f41808f8d23ea6826bcaccc1 -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/rayanht
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@517a40cccb2ff3a9f41808f8d23ea6826bcaccc1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file msplat-1.1.2-cp312-cp312-macosx_14_0_arm64.whl.
File metadata
- Download URL: msplat-1.1.2-cp312-cp312-macosx_14_0_arm64.whl
- Upload date:
- Size: 650.0 kB
- Tags: CPython 3.12, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbdf2e55b92334a0baa4dd920631f2e703f6e27a97a0ba5ed441bc0076c74c50
|
|
| MD5 |
7001c3e67b1aa8fc00ff5a54632e69c6
|
|
| BLAKE2b-256 |
f423e2e28620a419a4db883ab687b651713b12881417361d3ea57ea09bef8af5
|
Provenance
The following attestation bundles were made for msplat-1.1.2-cp312-cp312-macosx_14_0_arm64.whl:
Publisher:
release.yml on rayanht/msplat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msplat-1.1.2-cp312-cp312-macosx_14_0_arm64.whl -
Subject digest:
bbdf2e55b92334a0baa4dd920631f2e703f6e27a97a0ba5ed441bc0076c74c50 - Sigstore transparency entry: 1042649269
- Sigstore integration time:
-
Permalink:
rayanht/msplat@517a40cccb2ff3a9f41808f8d23ea6826bcaccc1 -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/rayanht
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@517a40cccb2ff3a9f41808f8d23ea6826bcaccc1 -
Trigger Event:
push
-
Statement type: