NVIDIA Video Pose Engine

These details have not been verified by PyPI

Project links

Project description

ViPE: Video Pose Engine for Geometric 3D Perception

teaser

TL;DR: ViPE is a useful open-source spatial AI tool for annotating camera poses and dense depth maps from raw videos!

Contributors: NVIDIA (Spatial Intelligence Lab, Dynamic Vision Lab, NVIDIA Issac, NVIDIA Research).

Full Abstract: Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a key challenge. In this work, we introduce ViPE, a handy and versatile video processing engine designed to bridge this gap. ViPE efficiently estimates camera intrinsics, camera motion, and dense, near-metric depth maps from unconstrained raw videos. It is robust to diverse scenarios, including dynamic selfie videos, cinematic shots, or dashcams, and supports various camera models such as pinhole, wide-angle, and 360° panoramas. We use ViPE to annotate a large-scale collection of videos. This collection includes around 100K real-world internet videos, 1M high-quality AI-generated videos, and 2K panoramic videos, totaling approximately 96M frames -- all annotated with accurate camera poses and dense depth maps. We open source ViPE and the annotated dataset with the hope to accelerate the development of spatial AI systems.

Technical Whitepaper, Project Page, Dataset

Installation

Installing From PyPI

pip install nvidia-vipe

This installs the vipe Python package and the vipe CLI. ViPE builds native CUDA extensions during installation, so the environment still needs a CUDA-enabled PyTorch build and an available CUDA toolkit with nvcc. If a compatible binary wheel is available for your platform, pip will use it directly and skip the local CUDA build.

Installing From Source

To keep native and Python dependencies separate, we use conda for the CUDA/native toolchain and uv for the local Python environment in .venv.

# Create a conda environment for uv, CUDA, and native build dependencies.
conda env create -f envs/cu128.yml
conda activate cu128

# Create .venv, install Python runtime dependencies, and build the package.
uv sync

For development, include the dev dependency group:

conda activate cu128
uv sync --dev

uv run --dev pre-commit install
uv run --dev ruff format .
uv run --dev ruff check .
uv run --dev mypy

Usage

Using the ViPE CLI

Once the python package is installed, you can use the vipe CLI to process raw videos in mp4 format.

# Replace YOUR_VIDEO.mp4 with the path to your video. We provide sample videos in assets/examples.
uv run vipe infer YOUR_VIDEO.mp4
# Additional options:
#   --output: Output directory (default: vipe_results)
#   --visualize: Enable visualization of intermediate and final results (default: false)
#   --pipeline: Pipeline configuration to use (default: default)

vipe-vis

Currently, we support the following pipeline configurations:

default: The default pipeline for pinhole cameras.
lyra: Configuration for results in the Lyra paper.
dav3: Using the newest Depth-Anything-V3 model as depth estimation model.
no_vda: If running video-depth-anything is too memory-consuming for you, this configuration can produce less temporally-stable depth (but empirically more 3D consistent) maps.
wide_angle: If your video contains some wide-angle or fisheye distortion.
panorama: For 360° videos, add pipeline.post.depth_align_model=dap (MIT) or unik3d (CC-BY-NC 4.0) to enable depth estimation for panoramas.

One can visualize the results that ViPE produces by running (supported by viser):

uv run vipe visualize vipe_results/
# Please modify the above vipe_results/ path to the output directory of your choice.

vipe-viser

Using the `run.py` script

The run.py script is a more flexible way to run ViPE. Compared to the CLI, the script supports running on multiple videos at once and allows more fine-grained control over the pipeline with hydra configs. It also provides an example of using vipe as a library in your own project.

Example usages:

# Running the full pipeline.
uv run python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO_OR_DIR_PATH

# Running the pose-only pipeline without depth estimation.
uv run python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO_OR_DIR_PATH pipeline.post.depth_align_model=null

Converting to COLMAP format

You can use the following script to convert the ViPE results to COLMAP format. For example:

uv run python scripts/vipe_to_colmap.py vipe_results/ --sequence dog_example

This will unproject the dense depth maps to create the 3D point cloud. Alternatively for a more lightweight and 3D consistent point cloud, you can add the --use_slam_map flag to the above command. This requires you to run the full pipeline with pipeline.output.save_slam_map=true to save the additional information.

Downloading the Dataset

dataset

Together with ViPE we release a large-scale dataset containing ~1M high-quality videos with accurate camera poses and dense depth maps. Specifications of the datasets are listed below:

Dataset Name	# Videos	# Frames	Hugging Face Link	License	Prefix
Dynpose-100K++	99,501	15.8M	Link	CC-BY-NC 4.0	`dpsp`
Wild-SDG-1M	966,448	78.2M	Link	CC-BY-NC 4.0	`wsdg`
Web360	2,114	212K	Link	CC-BY 4.0	`w360`

You can download the datasets using the following utility script:

# Replace YOUR_PREFIX with the prefix of the dataset to be downloaded (see prefix column in the table above)
# You can also use more specific prefixes, e.g. wsdg-003e2c86 to download a specific shard of the dataset.
uv run python scripts/download_dataset.py --prefix YOUR_PREFIX --output_base YOUR_OUTPUT_DIR --rgb --depth

Note that the depth component is very large and you might expect a long downloading time. For rgb component of the Dynpose-100K++ dataset, we directly retrieve the RGB frames from YouTube. You have to pip install yt_dlp ffmpeg-python to use this feature. Please refer to the original Dynpose-100K dataset for alternative approaches to retrieve the videos.

The dataset itself can be visualized using the same visualization script:

uv run vipe visualize YOUR_OUTPUT_DIR

Acknowledgments

ViPE is built on top of many great open-source research projects and codebases. Some of these include (not exhaustive):

Please refer to the THIRD_PARTY_LICENSES.md for a full list of projects and their licenses.

We thank useful discussions from Aigul Dzhumamuratova, Viktor Kuznetsov, Soha Pouya, and Ming-Yu Liu, as well as release support from Vishal Kulkarni.

TODO

Initial code released under Apache 2.0 license.
Full dataset uploaded to Hugging Face for download.
Add instructions to run inference on wide-angle and 360° videos.
Add instructions for benchmarking.

Citation

If you find ViPE useful in your research or application, please consider citing the following whitepaper:

@inproceedings{huang2025vipe,
    title={ViPE: Video Pose Engine for 3D Geometric Perception},
    author={Huang, Jiahui and Zhou, Qunjie and Rabeti, Hesam and Korovko, Aleksandr and Ling, Huan and Ren, Xuanchi and Shen, Tianchang and Gao, Jun and Slepichev, Dmitry and Lin, Chen-Hsuan and Ren, Jiawei and Xie, Kevin and Biswas, Joydeep and Leal-Taixe, Laura and Fidler, Sanja},
    booktitle={NVIDIA Research Whitepapers arXiv:2508.10934},
    year={2025}
}

License

This project will download and install additional third-party models and softwares. Note that these models or softwares are not distributed by NVIDIA. Review the license terms of these models and projects before use. This source code, except for the Unik3D part (which is under the BY-NC-SA 4.0 license) , is released under the Apache 2 License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

May 22, 2026

1.0.0

May 15, 2026

0.1.2

May 15, 2026

This version

0.1.1

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvidia_vipe-0.1.1.tar.gz (540.8 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvidia_vipe-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (33.1 MB view details)

Uploaded May 14, 2026 CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file nvidia_vipe-0.1.1.tar.gz.

File metadata

Download URL: nvidia_vipe-0.1.1.tar.gz
Upload date: May 14, 2026
Size: 540.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for nvidia_vipe-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9ecfcad6e48d9b0a490423a92bd6a7040ccfc7a8a0e0034a02706366096c57e4`
MD5	`eeeb96093b6806b84e345dfeddf75052`
BLAKE2b-256	`51e1988abd802cf9b447ac8c5fe3466d0fb2ab1bf46d1d6b883274dc3bad6e45`

See more details on using hashes here.

File details

Details for the file nvidia_vipe-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

Download URL: nvidia_vipe-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Upload date: May 14, 2026
Size: 33.1 MB
Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for nvidia_vipe-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`46b2d3a58ae2b64fab940f2ddf6f7a569f920a7882d2ded08ab014752799440a`
MD5	`ef38970539e6f14912af2abe4c131c8b`
BLAKE2b-256	`173ddce5345c0a184136b81df662e10d24517bf22ba43bbfd8e5a35165e90fba`

See more details on using hashes here.

nvidia-vipe 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ViPE: Video Pose Engine for Geometric 3D Perception

Installation

Installing From PyPI

Installing From Source

Usage

Using the ViPE CLI

Using the `run.py` script

Converting to COLMAP format

Downloading the Dataset

Acknowledgments

TODO

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

nvidia-vipe 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ViPE: Video Pose Engine for Geometric 3D Perception

Installation

Installing From PyPI

Installing From Source

Usage

Using the ViPE CLI

Using the run.py script

Converting to COLMAP format

Downloading the Dataset

Acknowledgments

TODO

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using the `run.py` script