Refactored python training and inference code for 3D Gaussian Splatting
Project description
Feature 3DGS (Packaged Python Version)
This repo is the refactored Python training and inference code for Feature 3DGS.
Built on top of gaussian-splatting, we reorganised the original code as a standard Python package with a modular Extractor-Decoder architecture, making it easy to swap foundation models without changing the core pipeline.
Each Gaussian point carries a learnable semantic embedding alongside standard 3DGS attributes. A frozen Extractor produces ground-truth feature maps from training images, while a lightweight learnable Decoder maps the rasterised per-point embeddings back to the extractor's feature space. The framework is backbone-agnostic: new foundation models can be plugged in by implementing an Extractor-Decoder pair and registering it.
Features
- Organised as a standard Python package with
pip installsupport - Modular Extractor-Decoder architecture for plugging in arbitrary foundation models
- Built-in DINOv3 support (ViT and ConvNeXt backbones)
- Auto-registration pattern — add new models with zero changes to core code
- PCA-based feature visualisation for both ground-truth and rendered feature maps
- All training modes from upstream: base, densify, camera, camera-densify
Install
Prerequisites
- Pytorch (>= v2.4 recommended)
- CUDA Toolkit (12.4 recommended, match with PyTorch version)
- gsplat
Development Install
pip install --upgrade git+https://github.com/facebookresearch/dinov3@main
pip install --upgrade git+https://github.com/yindaheng98/gaussian-splatting.git@master --no-build-isolation
pip install --target . --upgrade . --no-deps
Download Checkpoints
Request access and download DINOv3 weights to checkpoints/:
checkpoints/
├── dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth
├── dinov3_convnext_large_pretrain_lvd1689m-61fa432d.pth
├── dinov3_convnext_small_pretrain_lvd1689m-296db49d.pth
├── dinov3_convnext_tiny_pretrain_lvd1689m-21b726bb.pth
├── dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth
├── dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth
├── dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth
├── dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth
├── dinov3_vits16_pretrain_lvd1689m-08c60483.pth
├── dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth
└── ...
Command-Line Usage
Visualise Extractor Output
Verify that the extractor produces meaningful features before training:
python -m feature_3dgs.show \
--name dinov3_vitl16 \
-s data/truck -d output/truck-dinov3_vitl16 \
-o checkpoint_dir="'checkpoints'"
Train
python -m feature_3dgs.train \
--name dinov3_vitl16 --embed_dim 32 \
-s data/truck -d output/truck-semantic -i 30000 \
--mode densify
Render
python -m feature_3dgs.render \
--name dinov3_vitl16 --embed_dim 32 \
-s data/truck -d output/truck-semantic -i 30000
Rendered feature maps are PCA-projected to RGB and saved alongside ground-truth feature visualisations.
API Usage
Dataset & Decoder
from feature_3dgs.prepare import prepare_dataset_and_decoder
dataset, decoder = prepare_dataset_and_decoder(
name="dinov3_vitl16", # registered extractor-decoder name
source="data/truck",
embed_dim=32,
device="cuda",
)
# dataset is a FeatureCameraDataset; each camera carries a 'feature_map' in custom_data
# decoder is the learnable AbstractFeatureDecoder
Gaussian Model
from feature_3dgs.prepare import prepare_gaussians
gaussians = prepare_gaussians(
decoder=decoder, sh_degree=3,
source="data/truck", device="cuda",
)
SemanticGaussianModel extends GaussianModel with _semantic_features (per-point learnable embeddings) and a _decoder. During rendering, the rasteriser splatts the semantic features into a 2D feature map, and the decoder transforms it to match the extractor's output space.
Training
from feature_3dgs.prepare import prepare_trainer
trainer = prepare_trainer(gaussians, dataset, mode="densify")
for camera in dataset:
loss, out = trainer.step(camera)
Inference
import torch
with torch.no_grad():
for camera in dataset:
out = gaussians(camera)
rgb = out["render"] # (3, H, W)
feat = out["feature_map"] # (D, H', W')
Save & Load
gaussians.save_ply("output/point_cloud.ply")
# also saves point_cloud.ply.semantic.pt and point_cloud.ply.decoder.pt
gaussians.load_ply("output/point_cloud.ply")
Design: Extractor & Decoder
The core abstraction decouples what features to distill (Extractor) from how to map rasterised embeddings back (Decoder).
Extractor (AbstractFeatureExtractor)
The extractor is a frozen foundation model that converts training images into dense feature maps. It runs only on the dataset side — each training view is processed once, cached, and served as the ground-truth supervision signal.
Image (C, H, W) ──► Extractor (frozen) ──► Feature Map (D, H', W')
The extractor defines the target feature space (dimension D and spatial resolution H'×W'). It is never updated during training.
Decoder (AbstractFeatureDecoder)
The decoder is a learnable module that sits inside the rendering pipeline. After the Gaussian rasteriser splatts per-point semantic embeddings (dimension embed_dim) into a 2D feature map at image resolution, the decoder transforms this rasterised map to match the extractor's output:
Per-point embeddings ──► Rasteriser ──► Raw Feature Map (embed_dim, H, W)
│
▼
Decoder (learnable)
│
▼
Decoded Feature Map (D, H', W')
The training loss is L1(Decoded Feature Map, Extractor Feature Map). The decoder's role is to bridge the gap between the compact per-point embedding (embed_dim, typically 32) and the extractor's high-dimensional output (D, e.g. 1024 for ViT-L), while also handling any spatial resolution change.
Why this split?
- Memory efficiency: Only
embed_dimchannels are stored per Gaussian and rasterised, not the fullDchannels. The decoder upprojects after rasterisation. - Spatial alignment: Foundation models often output at patch resolution (e.g. 1/16 for ViT). The decoder can downsample the rasterised full-resolution map to match, avoiding expensive full-resolution feature supervision.
- Modularity: Swapping the foundation model only requires a new Extractor-Decoder pair. The Gaussian model, trainer, and rendering pipeline remain unchanged.
Extending: Adding a New Foundation Model
The project uses an auto-registration pattern. To add support for a new model (e.g. a hypothetical MyModel), follow the DINOv3 implementation as a reference:
Step 1: Implement the Extractor
Create feature_3dgs/mymodel/extractor.py:
import torch
from feature_3dgs.extractor import AbstractFeatureExtractor
class MyModelExtractor(AbstractFeatureExtractor):
def __init__(self, model, ...):
self.model = model
self.model.eval()
@torch.no_grad()
def __call__(self, image: torch.Tensor) -> torch.Tensor:
# image: (C, H, W) in [0, 1]
# Return: (D, H', W') feature map
...
def to(self, device) -> 'MyModelExtractor':
self.model.to(device)
return self
Step 2: Implement the Decoder
Create feature_3dgs/mymodel/decoder.py:
import torch
import torch.nn as nn
from feature_3dgs.decoder import NoopFeatureDecoder
class MyModelDecoder(NoopFeatureDecoder):
def __init__(self, in_channels: int, out_channels: int, ...):
super().__init__(embed_dim=in_channels)
# Build a small network that maps
# (in_channels, H, W) -> (out_channels, H', W')
# to match the extractor's spatial and channel dimensions.
self.net = nn.Sequential(...)
def __call__(self, feature_map: torch.Tensor) -> torch.Tensor:
# feature_map: (in_channels, H, W), same resolution as input image
# Return: (out_channels, H', W'), matching extractor output
return self.net(feature_map.unsqueeze(0)).squeeze(0)
def to(self, device):
self.net = self.net.to(device)
return self
def load(self, path: str):
self.net.load_state_dict(torch.load(path, weights_only=True))
def save(self, path: str):
torch.save(self.net.state_dict(), path)
def parameters(self):
return self.net.parameters()
The key design constraint: the decoder's output spatial size and channel count must exactly match the extractor's output, so that L1 loss can be computed directly.
For example, the DINOv3 ViT extractor outputs at patch resolution (D, H/16, W/16), so DINOv3CNNDecoder uses a Conv2d(kernel_size=16, stride=16) to downsample the full-resolution rasterised map to the same grid.
Step 3: Register via Factory
Create feature_3dgs/mymodel/registry.py:
from feature_3dgs.registry import register_extractor_decoder
from .extractor import MyModelExtractor
from .decoder import MyModelDecoder
FEATURE_DIM = 768 # D of your model's output
def factory(embed_dim: int, **configs):
extractor = MyModelExtractor(...)
decoder = MyModelDecoder(
in_channels=embed_dim,
out_channels=FEATURE_DIM,
...
)
return extractor, decoder
register_extractor_decoder("mymodel", factory)
Step 4: Trigger Registration on Import
Create feature_3dgs/mymodel/__init__.py:
from . import registry # triggers register_extractor_decoder() at import time
Then add the import in feature_3dgs/__init__.py:
from . import mymodel # auto-registers "mymodel"
After these steps, the new model is available everywhere:
python -m feature_3dgs.train --name mymodel --embed_dim 32 -s data/truck -d output/truck-mymodel -i 30000
Acknowledgement
This repo is developed based on Feature 3DGS, 3D Gaussian Splatting, and gaussian-splatting (packaged). Many thanks to the authors for open-sourcing their codebases.
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou, Haoran Chang*, Sicheng Jiang*, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi (* indicates equal contribution)
| Webpage | Full Paper | Video | Original Code |
Abstract: 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model.
BibTeX
@inproceedings{zhou2024feature,
title={Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields},
author={Zhou, Shijie and Chang, Haoran and Jiang, Sicheng and Fan, Zhiwen and Zhu, Zehao and Xu, Dejia and Chari, Pradyumna and You, Suya and Wang, Zhangyang and Kadambi, Achuta},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21676--21685},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file feature_3dgs-1.0.0.tar.gz.
File metadata
- Download URL: feature_3dgs-1.0.0.tar.gz
- Upload date:
- Size: 43.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
827c1499f1824857888a3b4270668dd0d24dfb8c59d44379686352e2ca2dc4d1
|
|
| MD5 |
6d9f46fe7de16694168bd87172d190ac
|
|
| BLAKE2b-256 |
752df788be94efec829df9b48b5a09320acb274a2023f68ccaba4d82420b25b4
|
File details
Details for the file feature_3dgs-1.0.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: feature_3dgs-1.0.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 542.9 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fa8ac114a3751abd53e35e8f226d08391a766fb4e33d5e3b53fd03ad30a5ffb
|
|
| MD5 |
404ce9892cc530275aef69ef71721245
|
|
| BLAKE2b-256 |
041e9bb4bea57ca18f78b6355b1647e73af18cf7ea0f042fcecde41e940c69ad
|
File details
Details for the file feature_3dgs-1.0.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: feature_3dgs-1.0.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 542.1 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fdc808d8b9a5a2716ce659eb3af9ce0c1b6ca25030087a64a8ae44c6cf48c1a
|
|
| MD5 |
b4bd27f358787c0089c978a79ee00b6a
|
|
| BLAKE2b-256 |
72e89c981404eb3305366dd00ca3a3b4d600e1d8d9232291fa589738a60ff089
|
File details
Details for the file feature_3dgs-1.0.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: feature_3dgs-1.0.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 541.0 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe2f4bd9eb0eeb1c8bdcdb0de2dc9b81ddee638db147003b68d81a1862fd7790
|
|
| MD5 |
fe83e565502dc1581554ed28caf746a4
|
|
| BLAKE2b-256 |
643c1790a79937036a75b0a60ae4c2e0b4da52d19ca05862245d8103f397b208
|