Skip to main content

Refactored python training and inference code for 3D Gaussian Splatting

Project description

Feature 3DGS (Packaged Python Version)

This repo is the refactored Python training and inference code for Feature 3DGS. Built on top of gaussian-splatting, we reorganised the original code as a standard Python package with a modular Extractor-Decoder architecture, making it easy to swap foundation models without changing the core pipeline.

Each Gaussian point carries a learnable encoded semantics embedding alongside standard 3DGS attributes. A frozen Extractor produces ground-truth feature maps from training images, while a lightweight learnable Decoder maps the rasterised per-point embeddings back to the extractor's feature space. The decoder's per-point transform can also be applied directly to the stored embeddings, yielding extractor-aligned semantic features without rendering. The framework is backbone-agnostic: new foundation models can be plugged in by implementing an Extractor-Decoder pair and registering it.

Features

  • Organised as a standard Python package with pip install support
  • Modular Extractor-Decoder architecture for plugging in arbitrary foundation models
  • Built-in DINOv3 support (ViT and ConvNeXt backbones)
  • Auto-registration pattern — add new models with zero changes to core code
  • PCA-based feature visualisation for both ground-truth and rendered feature maps
  • All training modes from upstream: base, densify, camera, camera-densify

Install

Prerequisites

Install dinov3 and vggt:

pip install --upgrade git+https://github.com/facebookresearch/dinov3.git@main
pip install --upgrade git+https://github.com/facebookresearch/vggt.git@main
pip install --upgrade Pillow hydra-core omegaconf # deps for vggt
pip install --upgrade git+https://github.com/jytime/LightGlue.git#egg=lightglue # deps for vggt

(Optional) If you have trouble with gaussian-splatting, try to install it from source:

pip install wheel setuptools
pip install --upgrade git+https://github.com/yindaheng98/gaussian-splatting.git@master --no-build-isolation

PyPI Install

pip install --upgrade feature-3dgs

or build latest from source:

pip install wheel setuptools
pip install --upgrade git+https://github.com/yindaheng98/feature-3dgs.git@main --no-build-isolation

Development Install

git clone --recursive https://github.com/yindaheng98/feature-3dgs.git
cd feature-3dgs
pip install --target . --upgrade . --no-deps

Download Checkpoints

DINOv3

Request access and download DINOv3 weights to checkpoints/:

checkpoints/
 ├── dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth
 ├── dinov3_convnext_large_pretrain_lvd1689m-61fa432d.pth
 ├── dinov3_convnext_small_pretrain_lvd1689m-296db49d.pth
 ├── dinov3_convnext_tiny_pretrain_lvd1689m-21b726bb.pth
 ├── dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth
 ├── dinov3_vitb16_pretrain_lvd1689m-73cec8be.pth
 ├── dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth
 ├── dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth
 ├── dinov3_vits16_pretrain_lvd1689m-08c60483.pth
 ├── dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth
 └── ...

VGGT

wget -P checkpoints/ https://huggingface.co/facebook/VGGT-1B-Commercial/resolve/main/vggt_1B_commercial.pt --header="Authorization: Bearer $HF_TOKEN"

Command-Line Usage

Visualise Extractor Output

Verify that the extractor produces meaningful features before training:

python -m feature_3dgs.show \
    --name dinov3_vitl16 \
    -s data/truck -d output/truck-dinov3_vitl16 \
    -o checkpoint_dir="'checkpoints'"

Train

python -m feature_3dgs.train \
    --name dinov3_vitl16 --embed_dim 32 \
    -s data/truck -d output/truck-semantic -i 30000 \
    --mode densify

Render

python -m feature_3dgs.render \
    --name dinov3_vitl16 --embed_dim 32 \
    -s data/truck -d output/truck-semantic -i 30000

Rendered feature maps are PCA-projected to RGB and saved alongside ground-truth feature visualisations.

Interactive Viewer

python -m feature_3dgs.viewer \
    --name dinov3_vitl16 --embed_dim 32 \
    -s data/truck -d output/truck-semantic -i 30000 \
    --port 8080

Opens an interactive viewer (via nerfview) that renders PCA-colourised semantic feature maps in real time from free-viewpoint camera controls.

API Usage

Dataset & Decoder

from feature_3dgs.prepare import prepare_dataset_and_decoder

dataset, decoder = prepare_dataset_and_decoder(
    name="dinov3_vitl16",   # registered extractor-decoder name
    source="data/truck",
    embed_dim=32,
    device="cuda",
)
# dataset is a FeatureCameraDataset; each camera carries a 'feature_map' in custom_data
# decoder is the learnable AbstractTrainableDecoder

Gaussian Model

from feature_3dgs.prepare import prepare_gaussians

gaussians = prepare_gaussians(
    decoder=decoder, sh_degree=3,
    source="data/truck", dataset=dataset, device="cuda",
)

SemanticGaussianModel extends GaussianModel with _encoded_semantics (per-point learnable embeddings in a compact latent space) and a _decoder. During rendering, the rasteriser splatts the encoded semantics into a 2D feature map, and the decoder transforms it to match the extractor's output space. The output dict contains both feature_map (decoded, extractor-aligned) and feature_map_encoded (raw rasterised).

Training

from feature_3dgs.prepare import prepare_trainer

trainer = prepare_trainer(gaussians, dataset, mode="densify")
for camera in dataset:
    loss, out = trainer.step(camera)

Inference

import torch
with torch.no_grad():
    for camera in dataset:
        out = gaussians(camera)
        rgb = out["render"]                  # (3, H, W)
        feat = out["feature_map"]            # (D, H', W')  decoded, extractor-aligned
        feat_enc = out["feature_map_encoded"] # (embed_dim, H, W)  raw rasterised

    # Per-Gaussian semantic features (no rendering needed)
    semantics = gaussians.get_semantics      # (N, D)  via decoder.decode_features

    # Custom linear projection at full resolution (e.g. PCA visualisation)
    weight, bias = ...  # (C, D) and (C,)
    out = gaussians.forward_projection(camera, weight, bias)
    projected = out["feature_map"]           # (C, H, W)

Save & Load

gaussians.save_ply("output/point_cloud.ply")
# also saves point_cloud.ply.semantic.pt and point_cloud.ply.decoder.pt

gaussians.load_ply("output/point_cloud.ply")

Design: Extractor & Decoder

The core abstraction decouples what features to distill (Extractor) from how to map rasterised embeddings back (Decoder).

Extractor (AbstractFeatureExtractor)

The extractor is a frozen foundation model that converts training images into dense feature maps. It runs only on the dataset side — each training view is processed once, cached, and served as the ground-truth supervision signal.

Image (C, H, W)  ──► Extractor (frozen) ──► Feature Map (D, H', W')

The extractor defines the target feature space (dimension D and spatial resolution H'×W'). It is never updated during training.

Decoder (AbstractSemanticDecoder)

The decoder is a learnable module with three core operations (defined on AbstractSemanticDecoder):

Method Signature Purpose
decode_features(features) (N, C_in) → (N, C_out) Per-point mapping, usable on per-Gaussian encoded semantics directly
decode_feature_map(feature_map) (C_in, H, W) → (C_out, H', W') Full rendered feature map → extractor output format (channel + spatial)
decode_feature_pixels(feature_map, weight, bias) (C_in, H, W) → (C_proj, H, W) Per-pixel projection: decode_features + optional custom linear, spatial resolution preserved

The trainable subclass AbstractTrainableDecoder adds:

Method Signature Purpose
init_semantic(gaussians, dataset) static Build the mapping from data (e.g. PCA initialisation)
parameters() Return trainable parameters for the optimiser
Encoded semantics ──► Rasteriser ──► Raw Feature Map (embed_dim, H, W)
                                           │
                          ┌────────────────┼────────────────┐
                          ▼                ▼                ▼
                  decode_feature_map  forward_projection (stored as
                          │           (custom linear)    feature_map_encoded)
                          ▼                ▼
              Decoded Feature Map    Projected Map
               (D, H', W')          (C, H, W)

The default decode_feature_map applies decode_features per pixel (no spatial change). Subclasses may override it with reparameterized implementations for memory efficiency — e.g. the DINOv3 decoder reparameterizes a linear mapping followed by patch-level average pooling into a single F.conv2d call, avoiding a large intermediate tensor. Similarly, decode_feature_pixels reparameterizes two sequential linear layers into one combined projection.

The training loss is L1(Decoded Feature Map, Extractor Feature Map). The decoder's role is to bridge the gap between the compact per-point embedding (embed_dim, typically 32) and the extractor's high-dimensional output (D, e.g. 1024 for ViT-L), while also handling any spatial resolution change.

Why this split?

  1. Memory efficiency: Only embed_dim channels are stored per Gaussian and rasterised, not the full D channels. The decoder upprojects after rasterisation.
  2. Spatial alignment: Foundation models often output at patch resolution (e.g. 1/16 for ViT). The decoder can downsample the rasterised full-resolution map to match, avoiding expensive full-resolution feature supervision.
  3. Direct feature access: decode_features can be applied directly to per-Gaussian encoded semantics (via get_semantics), producing extractor-aligned features without rendering.
  4. Modularity: Swapping the foundation model only requires a new Extractor-Decoder pair. The Gaussian model, trainer, and rendering pipeline remain unchanged.

Extending: Adding a New Foundation Model

The project uses an auto-registration pattern. To add support for a new model (e.g. a hypothetical MyModel), follow the DINOv3 implementation as a reference:

Step 1: Implement the Extractor

Create feature_3dgs/mymodel/extractor.py:

import torch
from feature_3dgs.extractor import AbstractFeatureExtractor

class MyModelExtractor(AbstractFeatureExtractor):
    def __init__(self, model, ...):
        self.model = model
        self.model.eval()

    @torch.no_grad()
    def __call__(self, image: torch.Tensor) -> torch.Tensor:
        # image: (C, H, W) in [0, 1]
        # Return: (D, H', W') feature map
        ...

    def to(self, device) -> 'MyModelExtractor':
        self.model.to(device)
        return self

Step 2: Implement the Decoder

Create feature_3dgs/mymodel/decoder.py. The built-in LinearDecoder already provides a trainable nn.Linear, PCA-based initialisation, persistence, and reparameterized per-pixel operations. Subclass it and optionally override decode_feature_map / encode_feature_map if the extractor outputs at a different spatial resolution:

import torch
import torch.nn.functional as F
from feature_3dgs.decoder import LinearDecoder

class MyModelDecoder(LinearDecoder):
    def __init__(self, in_channels: int, out_channels: int, patch_size: int):
        super().__init__(in_channels, out_channels)
        self.patch_size = patch_size

    def decode_feature_map(self, feature_map: torch.Tensor) -> torch.Tensor:
        # Optional override for fused / memory-efficient implementation.
        # Default (inherited): applies decode_features per pixel (no spatial change).
        # Override to add spatial downsampling when the extractor outputs at
        # a lower resolution than the rasteriser (e.g. patch-level).
        ...

    def encode_feature_map(self, feature_map: torch.Tensor, camera) -> torch.Tensor:
        # Inverse of decode_feature_map. Override if decode_feature_map is overridden.
        ...

If your extractor outputs at the same resolution as the rasteriser, you can use LinearDecoder directly without subclassing.

The key design constraint: decode_feature_map's output spatial size and channel count must exactly match the extractor's output, so that L1 loss can be computed directly.

For example, the DINOv3 ViT extractor outputs at patch resolution (D, H/P, W/P). DINOv3LinearAvgDecoder reparameterizes a trainable nn.Linear with patch-level average pooling into a single F.conv2d call (kernel derived from linear weights, stride = patch size), avoiding the large (D, H, W) intermediate tensor entirely.

Step 3: Register via Factory

Create feature_3dgs/mymodel/registry.py:

from feature_3dgs.registry import register_extractor_decoder
from .extractor import MyModelExtractor
from .decoder import MyModelDecoder

FEATURE_DIM = 768  # D of your model's output

def factory(embed_dim: int, **configs):
    extractor = MyModelExtractor(...)
    decoder = MyModelDecoder(
        in_channels=embed_dim,
        out_channels=FEATURE_DIM,
        ...
    )
    return extractor, decoder

register_extractor_decoder("mymodel", factory)

Step 4: Trigger Registration on Import

Create feature_3dgs/mymodel/__init__.py:

from . import registry  # triggers register_extractor_decoder() at import time

Then add the import in feature_3dgs/__init__.py:

from . import mymodel  # auto-registers "mymodel"

After these steps, the new model is available everywhere:

python -m feature_3dgs.train --name mymodel --embed_dim 32 -s data/truck -d output/truck-mymodel -i 30000

Acknowledgement

This repo is developed based on Feature 3DGS, 3D Gaussian Splatting, and gaussian-splatting (packaged). Many thanks to the authors for open-sourcing their codebases.

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou, Haoran Chang*, Sicheng Jiang*, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi (* indicates equal contribution)
| Webpage | Full Paper | Video | Original Code |

Abstract: 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model.

BibTeX

@inproceedings{zhou2024feature,
  title={Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields},
  author={Zhou, Shijie and Chang, Haoran and Jiang, Sicheng and Fan, Zhiwen and Zhu, Zehao and Xu, Dejia and Chari, Pradyumna and You, Suya and Wang, Zhangyang and Kadambi, Achuta},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21676--21685},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_3dgs-1.9.5.tar.gz (101.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

feature_3dgs-1.9.5-cp312-cp312-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.12Windows x86-64

feature_3dgs-1.9.5-cp311-cp311-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.11Windows x86-64

feature_3dgs-1.9.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

feature_3dgs-1.9.5-cp310-cp310-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.10Windows x86-64

feature_3dgs-1.9.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file feature_3dgs-1.9.5.tar.gz.

File metadata

  • Download URL: feature_3dgs-1.9.5.tar.gz
  • Upload date:
  • Size: 101.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for feature_3dgs-1.9.5.tar.gz
Algorithm Hash digest
SHA256 733230659eb15ad2a4390592c93a9c7ad8c471e87046199c44887d6bb0f8e087
MD5 5e6895db40d1d53dfb7e6931a6021523
BLAKE2b-256 f0d966fd413977c84649d32b85f842be013459fc4eacfb2dbc0a14412db32c99

See more details on using hashes here.

File details

Details for the file feature_3dgs-1.9.5-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for feature_3dgs-1.9.5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b69e5cc93952a400d0525b5bad21cde87738510aa7bd42d64ad777aaff77f453
MD5 b059f45f903500ab31ff42462c01bd1d
BLAKE2b-256 107e568d7a40e810251510f732095d71bbdb7081d97a2ef11b24a0786ed98cdd

See more details on using hashes here.

File details

Details for the file feature_3dgs-1.9.5-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for feature_3dgs-1.9.5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 21b4eaeb4093b3c455b9ea5ff5ba15ceb6d0fd100a196ba50b0d9a41bc61de2a
MD5 609c1d30599e466fa905c599ef7cd242
BLAKE2b-256 f9c2bcc58be91b5d224f06cee1148eefa962b4a51a61a8f76c5b9dd9cb1a3700

See more details on using hashes here.

File details

Details for the file feature_3dgs-1.9.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for feature_3dgs-1.9.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 35f3d3e3f89e0e30fe9e6dc056e6c09d1d2b086686c5b6adfda1ffd88092c1d5
MD5 ca4544d7f3c2777511f043e76c0ba8fe
BLAKE2b-256 62645620822e67b8286544440058e13b163c7671b524f4ae027af5e165e8cd31

See more details on using hashes here.

File details

Details for the file feature_3dgs-1.9.5-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for feature_3dgs-1.9.5-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 10f5093c5ec3b0458ba1c9a09820d561dc93d13a3a9f8b7aad05d4928905c6d6
MD5 aac436d74f9f014f68362ca3e600525b
BLAKE2b-256 f807baaf612c58415a7129467af970d4d63c719af7f436c38c33a9cc3744cbbd

See more details on using hashes here.

File details

Details for the file feature_3dgs-1.9.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for feature_3dgs-1.9.5-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a621088a0bf1e7f9a478a63bb22a00e80f8a57ec5db54b74e3109001ce3acf3e
MD5 120feb3d8c320579fcd115930bdbdab8
BLAKE2b-256 fc92847c574b2b725504eeccf86bf04f1e09eb08bb6778f15c250b5b649afc51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page