Skip to main content

Depth Anything V2: Robust Monocular Depth Estimation

Project description

Depth Anything V2

Lihe Yang1 · Bingyi Kang2† · Zilong Huang2
Zhen Zhao · Xiaogang Xu · Jiashi Feng2 · Hengshuang Zhao1*

1HKU   2TikTok
†project lead *corresponding author

Paper PDF Project Page Benchmark

This work presents Depth Anything V2. It significantly outperforms V1 in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.

teaser

News

  • 2025-01-22: Video Depth Anything has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes).
  • 2024-12-22: Prompt Depth Anything has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models.
  • 2024-07-06: Depth Anything V2 is supported in Transformers. See the instructions for convenient usage.
  • 2024-06-25: Depth Anything is integrated into Apple Core ML Models. See the instructions (V1, V2) for usage.
  • 2024-06-22: We release smaller metric depth models based on Depth-Anything-V2-Small and Base.
  • 2024-06-20: Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.
  • 2024-06-14: Paper, project page, code, models, demo, and benchmark are all released.

Pre-trained Models

We provide four models of varying scales for robust relative depth estimation:

Model Params Checkpoint
Depth-Anything-V2-Small 24.8M Download
Depth-Anything-V2-Base 97.5M Download
Depth-Anything-V2-Large 335.3M Download
Depth-Anything-V2-Giant 1.3B Coming soon

Installation

From PyPI

You can install Depth Anything V2 directly from PyPI:

pip install jkp-depth-anything-v2
# Or with uv
uv pip install jkp-depth-anything-v2

From Source

git clone https://github.com/DepthAnything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -e .
# Or with uv
uv pip install -e .

Usage

Prepraration

git clone https://github.com/DepthAnything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -r requirements.txt

Download the checkpoints listed here and put them under the checkpoints directory.

Use our models

import cv2
import torch

from depth_anything_v2.dpt import DepthAnythingV2

DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

model_configs = {
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}

encoder = 'vitl' # or 'vits', 'vitb', 'vitg'

model = DepthAnythingV2(**model_configs[encoder])
model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))
model = model.to(DEVICE).eval()

raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW raw depth map in numpy

If you do not want to clone this repository, you can also load our models through Transformers. Below is a simple code snippet. Please refer to the official page for more details.

  • Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers.
  • Note 2: Due to the upsampling difference between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above.
from transformers import pipeline
from PIL import Image

pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")
image = Image.open('your/image/path')
depth = pipe(image)["depth"]

Running script on images

python run.py \
  --encoder <vits | vitb | vitl | vitg> \
  --img-path <path> --outdir <outdir> \
  [--input-size <size>] [--pred-only] [--grayscale]

Options:

  • --img-path: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
  • --input-size (optional): By default, we use input size 518 for model inference. You can increase the size for even more fine-grained results.
  • --pred-only (optional): Only save the predicted depth map, without raw image.
  • --grayscale (optional): Save the grayscale depth map, without applying color palette.

For example:

python run.py --encoder vitl --img-path assets/examples --outdir depth_vis

Running script on videos

python run_video.py \
  --encoder <vits | vitb | vitl | vitg> \
  --video-path assets/examples_video --outdir video_depth_vis \
  [--input-size <size>] [--pred-only] [--grayscale]

Our larger model has better temporal consistency on videos.

Gradio demo

To use our gradio demo locally:

python app.py

You can also try our online demo.

Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this issue). In V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.

Fine-tuned to Metric Depth Estimation

Please refer to metric depth estimation.

DA-2K Evaluation Benchmark

Please refer to DA-2K benchmark.

Community Support

We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!

Acknowledgement

We are sincerely grateful to the awesome Hugging Face team (@Pedro Cuenca, @Niels Rogge, @Merve Noyan, @Amy Roberts, et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML.

We also thank the DINOv2 team for contributing such impressive models to our community.

LICENSE

Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.

Citation

If you find this project useful, please consider citing:

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jkp_depth_anything_v2-0.1.1.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jkp_depth_anything_v2-0.1.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file jkp_depth_anything_v2-0.1.1.tar.gz.

File metadata

File hashes

Hashes for jkp_depth_anything_v2-0.1.1.tar.gz
Algorithm Hash digest
SHA256 deffac7e1f127fa45ba6f6588a00c194b95a69822285e268fc7ed6fbb3049f16
MD5 598f0e7db12712eb8525b0d0816f5e69
BLAKE2b-256 33b0e7ae33e91fdba9e000efd7ad57ba732a9a1e8c820e079b8443b89e3cc5a8

See more details on using hashes here.

File details

Details for the file jkp_depth_anything_v2-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for jkp_depth_anything_v2-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 864150e8f51c8b27ac96908922a36937868c63adb13c0a2e6ca47b44e2bf29f3
MD5 c81bcecc7458c72c91703fdf3c9fd167
BLAKE2b-256 be1d564a34d7ec694bbc3d286b94dd8416491e9a780397ab7d137b9fbb3be39a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page