Feed-forward framework for online dynamic 3D reconstruction from uncalibrated video streams
Project description
[ICLR 2026] StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Overview
StreamSplat is a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner.
- Feed-forward inference: No per-scene optimization required
- Camera-free: Works directly with uncalibrated monocular videos
- Dynamic scene modeling: Handles both static and dynamic scene elements through polynomial motion modeling
- Probabilistic Gaussian prediction: Uses truncated Gaussian models for robust Gaussian position modeling
- Two-stage training: Stage 1 trains the static encoder, Stage 2 trains the dynamic decoder
Videos
https://github.com/user-attachments/assets/d72b75de-e07a-4d81-a23f-f85e99c9bd05
https://github.com/user-attachments/assets/d975b425-1e4c-4dc9-8398-0a1a43d50e67
https://github.com/user-attachments/assets/d2a064f5-f4d7-46c4-8a3a-28fcb8679df5
https://github.com/user-attachments/assets/466a222d-f3b5-447a-84c0-a47538071c05
Environment Setup
- Create conda environment:
conda env create -f environment.yml
conda activate StreamSplat
- Build the differentiable Gaussian rasterizer:
cd submodules/diff-gaussian-rasterization-orth
pip install .
- Download pretrained depth model:
Download Depth Anything V2 checkpoint and place it in the checkpoints/ directory:
mkdir -p checkpoints
# Download depth_anything_v2_vitl.pth from https://github.com/DepthAnything/Depth-Anything-V2
# Place it in checkpoints/depth_anything_v2_vitl.pth
Dataset Preparation
StreamSplat supports training on multiple datasets. All datasets require pre-computed depth maps using Depth Anything V2.
Supported Datasets
| Dataset | Type | Description |
|---|---|---|
| RealEstate10K | Static | Real estate videos |
| CO3Dv2 | Static | Object-centric multi-view |
| DAVIS | Dynamic | High-quality videos |
| YouTube-VOS | Dynamic | Large-scale videos |
Preprocessing Depth Maps
Use the provided script to preprocess depth maps for DAVIS (similar scripts can be adapted for other datasets):
python preprocess_depth_davis.py --root_path /path/to/davis
Configure Dataset Paths
Edit configs/options.py and configs/options_decoder.py to set dataset paths:
root_path_re10k: str = "/path/to/re10k"
root_path_co3d: str = "/path/to/co3d"
root_path_davis: str = "/path/to/davis"
root_path_vos: str = "/path/to/youtube-vos"
Training
Configure Accelerate
Create an accelerate config file (or use the provided acc_configs/gpu8.yaml):
accelerate config
Stage 1: Train Static Encoder
Train the static encoder on combined datasets:
accelerate launch --config_file acc_configs/gpu8.yaml train.py combined \
--workspace /path/to/workspace/encoder_exp
Stage 2: Train Dynamic Decoder
After Stage 1 completes, train the dynamic decoder with the frozen encoder:
accelerate launch --config_file acc_configs/gpu8.yaml train_decoder.py combined \
--workspace /path/to/workspace/decoder_exp \
--encoder_path /path/to/workspace/encoder_exp/model.safetensors
Monitoring Training
Training progress is logged to Weights & Biases. Set up wandb before training:
wandb login
Checkpoints are saved every 10 epochs and every 30 minutes to checkpoint_latest/.
Inference
Download our pretrained checkpoint at Google Drive and place it in the checkpoints/ directory:
python splat_inference.py \
--resume checkpoints/streamsplat.safetensors \
--input_frames_path=/path/to/rgb_frames \
--input_depths_path=/path/to/depth_maps
Citation
If you find this work useful, please cite:
@article{wu2025streamsplat,
title={StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams},
author={Zike Wu and Qi Yan and Xuanyu Yi and Lele Wang and Renjie Liao},
journal={arXiv preprint arXiv:2506.08862},
year={2025},
}
Acknowledgments
This project builds upon several excellent works:
- 3D Gaussian Splatting for the differentiable rasterization
- diff-gaussian-rasterization for the depth & alpha rendering
- DINOv2 for vision features
- Depth Anything V2 for monocular depth estimation
- Gamba and MVGamba for the codebase and training framework
- Nutworld for orthographic rasterization
- edm for data augmentation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file streamsplat-0.1.0.tar.gz.
File metadata
- Download URL: streamsplat-0.1.0.tar.gz
- Upload date:
- Size: 94.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91aad70ced179eaf78c64522672ba96ce18b4ad44d4cb2f9f8a7b84d303fe21a
|
|
| MD5 |
d17c6dc6b2246cfba579c264c488f13d
|
|
| BLAKE2b-256 |
67514da3994b80341b5a03f43dc86dfb2a24fca34011791c2897d3ff438f5947
|
File details
Details for the file streamsplat-0.1.0-py3-none-any.whl.
File metadata
- Download URL: streamsplat-0.1.0-py3-none-any.whl
- Upload date:
- Size: 126.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ee7bb8fb8b2146e8ac1ed8199ab497ca3a506be275f4d12a6c63a23e4b5e63d
|
|
| MD5 |
34a252550cf1780b2ba7bed1c7e63b6a
|
|
| BLAKE2b-256 |
1c07769688f99b5d882e151251a499b45aa3cd6fe65da2bf1de65160360c9250
|