LVSM - Pytorch
Project description
LVSM - Pytorch (wip)
Implementation of LVSM, SOTA Large View Synthesis with Minimal 3d Inductive Bias, from Adobe Research
We will focus only on the Decoder-only architecture in this repository.
This paper lines up with another from ICLR 2025
Install
$ pip install lvsm-pytorch
Usage
import torch
from lvsm_pytorch import LVSM
rays = torch.randn(2, 4, 6, 256, 256)
images = torch.randn(2, 4, 3, 256, 256)
target_rays = torch.randn(2, 6, 256, 256)
target_images = torch.randn(2, 3, 256, 256)
model = LVSM(
dim = 512,
max_image_size = 256,
patch_size = 32,
depth = 2,
)
loss = model(
input_images = images,
input_rays = rays,
target_rays = target_rays,
target_images = target_images
)
loss.backward()
# after much training
pred_images = model(
input_images = images,
input_rays = rays,
target_rays = target_rays,
) # (2, 3, 256, 256)
assert pred_images.shape == target_images.shape
Or from the raw camera intrinsic / extrinsics (please submit an issue or pull request if you see an error. new to view synthesis and out of my depths here)
import torch
from lvsm_pytorch.lvsm import LVSM, CameraWrapper
input_intrinsic_rotation = torch.randn(2, 4, 3, 3)
input_extrinsic_rotation = torch.randn(2, 4, 3, 3)
input_translation = torch.randn(2, 4, 3)
input_uniform_points = torch.randn(2, 4, 3, 256, 256)
target_intrinsic_rotation = torch.randn(2, 3, 3)
target_extrinsic_rotation = torch.randn(2, 3, 3)
target_translation = torch.randn(2, 3)
target_uniform_points = torch.randn(2, 3, 256, 256)
images = torch.randn(2, 4, 4, 256, 256)
target_images = torch.randn(2, 4, 256, 256)
lvsm = LVSM(
dim = 512,
max_image_size = 256,
patch_size = 32,
channels = 4,
depth = 2,
)
model = CameraWrapper(lvsm)
loss = model(
input_intrinsic_rotation = input_intrinsic_rotation,
input_extrinsic_rotation = input_extrinsic_rotation,
input_translation = input_translation,
input_uniform_points = input_uniform_points,
target_intrinsic_rotation = target_intrinsic_rotation,
target_extrinsic_rotation = target_extrinsic_rotation,
target_translation = target_translation,
target_uniform_points = target_uniform_points,
input_images = images,
target_images = target_images,
)
loss.backward()
Citations
@inproceedings{Jin2024LVSMAL,
title = {LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias},
author = {Haian Jin and Hanwen Jiang and Hao Tan and Kai Zhang and Sai Bi and Tianyuan Zhang and Fujun Luan and Noah Snavely and Zexiang Xu},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:273507016}
}
@article{Zhang2024CamerasAR,
title = {Cameras as Rays: Pose Estimation via Ray Diffusion},
author = {Jason Y. Zhang and Amy Lin and Moneish Kumar and Tzu-Hsuan Yang and Deva Ramanan and Shubham Tulsiani},
journal = {ArXiv},
year = {2024},
volume = {abs/2402.14817},
url = {https://api.semanticscholar.org/CorpusID:267782978}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lvsm_pytorch-0.0.11.tar.gz
(1.5 MB
view hashes)
Built Distribution
Close
Hashes for lvsm_pytorch-0.0.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8b47e18b9dfb40a940dd148e244b3b2f216912d3ee8c7c3b81ec3e672752572 |
|
MD5 | f3f189e5530156ac88ac28ae294c2020 |
|
BLAKE2b-256 | 0ea5631b76bb02eb826150e10147b14499c13061cdfc9fe6e0745b18677af067 |