A Transformers-style Python library for monocular depth estimation — inference, evaluation, and fine-tuning
Project description
depth_estimation
A unified Python library for monocular depth estimation
Inference · Video & Streaming · Visualization · Fine-Tuning · Evaluation · Dataset Loading
depth_estimation is the model-definition framework for depth estimation. It provides a single, consistent API across 12 model families and 28 variants — so you can swap models, compare them, and fine-tune them without rewriting your pipeline.
It covers the full workflow end-to-end: run inference with one line, stream depth from video, visualize results, evaluate on standard benchmarks, and fine-tune on custom depth data — all with the same library.
Installation
pip install depth-estimation
See docs/dependencies.md for optional extras (CUDA, MPS, etc.).
Quickstart
The pipeline API is the fastest way to get a depth map from any image:
from depth_estimation import pipeline
pipe = pipeline("depth-estimation", model="depth-anything-v2-vitb")
result = pipe("image.jpg")
depth_map = result.depth # np.ndarray, float32, (H, W)
colored = result.colored_depth # np.ndarray, uint8, (H, W, 3)
For full control over each step — preprocessing, forward pass, postprocessing — use Auto Classes:
from depth_estimation import AutoDepthModel, AutoProcessor
import torch
model = AutoDepthModel.from_pretrained("zoedepth")
processor = AutoProcessor.from_pretrained("zoedepth")
inputs = processor("image.jpg")
with torch.no_grad():
depth = model(inputs["pixel_values"])
result = processor.postprocess(depth, inputs["original_sizes"])
Or from the command line:
depth-estimate predict image.jpg --model depth-anything-v2-vitb
Why use depth_estimation?
1. One API, every model. Switch from Depth Anything to DepthPro to MoGe by changing a single string. Preprocessing, postprocessing, and output format are identical across all models.
2. The full depth workflow in one place. Most libraries stop at inference. This one covers training, evaluation on standard benchmarks, and dataset loading — so you don't have to stitch together separate tools.
3. Modular, single-file model design.
Each model lives in one self-contained file. No hidden abstractions. If you need to understand or modify a model, there's exactly one place to look. New models self-register — AutoDepthModel and pipeline() resolve them automatically.
4. Designed for research.
Trainable models with backbone freeze schedules, proper batch-level metric accumulation (no mean-of-means), and a compare() function that shows a formatted table across models.
Supported Models
12 model families · 28 variants — see docs/models.md for the full list.
All models support inference and CLI. The Trainable column indicates fine-tuning support via DepthTrainer.
| Family | Variants | Depth type | Trainable |
|---|---|---|---|
| Depth Anything v1 | vits / vitb / vitl | Relative | ✅ |
| Depth Anything v2 | vits / vitb / vitl | Relative | ✅ |
| Depth Anything v3 | small / base / large / giant / mono / metric | Relative + Metric | ✅ |
| Depth Anything v3 Nested | nested-giant-large | Relative | ✅ |
| ZoeDepth | nyu / kitti | Metric | ❌ |
| MiDaS | dpt-large / dpt-hybrid / beit-large | Relative | ✅ |
| Apple DepthPro | — | Metric | ✅ |
| Pixel-Perfect Depth | — | Relative | ❌ |
| Marigold-DC | — | Relative (depth completion) | ❌ |
| MoGe | v1 vitl / v2 vitl / v2 vitb / v2 vits (+ normal variants) | Metric | ❌ |
| OmniVGGT | vitl | Metric | ✅ |
| VGGT | standard / commercial | Metric | ✅ |
What can you do?
Inference — single image, batch, or video
# Single image
result = pipe("image.jpg")
# Batch
results = pipe(["img1.jpg", "img2.jpg"], batch_size=2)
# CLI — batch predict
depth-estimate predict "images/*.jpg" --model depth-anything-v2-vitb --output-dir results/
Video & Streaming — frame-by-frame depth from video, webcam, or image sequences
from depth_estimation import pipeline
pipe = pipeline("depth-estimation", model="depth-anything-v2-vitb")
# Stream a video file — yields DepthOutput per frame
for result in pipe.stream("video.mp4", temporal_smoothing=0.5):
depth = result.depth # (H, W) float32
colored = result.colored_depth # (H, W, 3) uint8
print(result.metadata["frame_index"])
# Webcam stream
for result in pipe.stream(0): # device index
...
# Frame glob (sorted alphabetically)
for result in pipe.stream("frames/*.png"):
...
# Write output video to disk
pipe.process_video(
"input.mp4",
"output_depth.mp4",
colormap="inferno",
side_by_side=True, # RGB | depth composite
temporal_smoothing=0.5,
)
# CLI — video prediction
depth-estimate predict video.mp4 --model depth-anything-v2-vitb --output depth_video.mp4
See docs/video.md.
Visualization — depth maps, comparisons, overlays, 3D animations, error maps
from depth_estimation.viz import (
show_depth, compare_depths, overlay_depth,
create_anaglyph, animate_3d, plot_error_map,
)
# Display a depth result
show_depth(result, colormap="Spectral_r", title="Depth Anything V2")
# Side-by-side comparison of multiple models
compare_depths([result_v2, result_pro], labels=["DA V2", "DepthPro"], save="compare.png")
# Blend depth over RGB image
overlay = overlay_depth(image, result.depth, alpha=0.5, colormap="inferno")
# Red-cyan anaglyph stereo image
anaglyph = create_anaglyph(image, result.depth, baseline=0.065)
# Rotating 3D surface animation
animate_3d(image, result.depth, "rotation.gif", frames=60)
# Per-pixel error heatmap (requires ground truth)
plot_error_map(pred_depth, gt_depth, metric="abs_rel", save="errors.png")
See docs/viz.md.
Evaluation — standard benchmarks, custom predictions
from depth_estimation.evaluation import evaluate, compare, Evaluator
# Single model on NYU Depth V2
results = evaluate("depth-anything-v2-vitb", "nyu_depth_v2", split="test")
# Compare multiple models — prints table with best values marked (*)
compare(["depth-anything-v2-vits", "depth-anything-v2-vitb"], dataset="nyu_depth_v2")
# Accumulate metrics over your own dataloader
ev = Evaluator()
for pred, gt, mask in dataloader:
ev.update(pred, gt, mask)
final = ev.compute() # abs_rel, sq_rel, rmse, rmse_log, delta1/2/3
See docs/evaluation.md.
Fine-Tuning — any trainable model, any depth dataset
from depth_estimation import DepthAnythingV2Model, DepthTrainer, DepthTrainingArguments, load_dataset
from depth_estimation.data.transforms import get_train_transforms, get_val_transforms
model = DepthAnythingV2Model.from_pretrained("depth-anything-v2-vits", for_training=True)
train_ds = load_dataset("nyu_depth_v2", split="train", transform=get_train_transforms(518))
val_ds = load_dataset("nyu_depth_v2", split="test", transform=get_val_transforms(518))
args = DepthTrainingArguments(output_dir="./checkpoints", num_epochs=25, batch_size=8,
freeze_backbone_epochs=5, mixed_precision=True)
DepthTrainer(model=model, args=args, train_dataset=train_ds, eval_dataset=val_ds).train()
Any torch.utils.data.Dataset returning pixel_values / depth_map / valid_mask works directly — no subclassing needed. See docs/training.md.
Dataset Loading — standard benchmarks, custom folders
from depth_estimation import load_dataset
ds = load_dataset("nyu_depth_v2", split="test") # auto-downloads ~2.8 GB
ds = load_dataset("diode", split="val", scene_type="indoors") # auto-downloads ~2.6 GB
ds = load_dataset("kitti_eigen", split="test", root="/data/kitti") # local path
ds = load_dataset("folder", image_dir="rgb/", depth_dir="depth/") # any folder
See docs/data.md.
Adding a New Model
- Create
src/depth_estimation/models/your_model/ - Add
configuration_your_model.py(inheritBaseDepthConfig) - Add
modeling_your_model.py(inheritBaseDepthModel, single file) - Add
__init__.pywithMODEL_REGISTRY.register(...)
AutoDepthModel, AutoProcessor, and pipeline() resolve the new model automatically. See docs/adding_a_model.md for a step-by-step guide.
Acknowledgments
This library builds upon the work of 12 research teams — see docs/models.md#citations for the full list.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file depth_estimation-0.1.1.tar.gz.
File metadata
- Download URL: depth_estimation-0.1.1.tar.gz
- Upload date:
- Size: 172.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1094c7997a9ca304d4a5b9f4798a4c49dfddd7808037ef6a68fbbb41da73167
|
|
| MD5 |
50d1a0158b30b2b873aa009fdf89c0d6
|
|
| BLAKE2b-256 |
59208347bf6a4408f838dd9108c31bc4cd6c7c5f2e70e2059f8a5cb37e222ad0
|
Provenance
The following attestation bundles were made for depth_estimation-0.1.1.tar.gz:
Publisher:
python-publish.yml on shriarul5273/depth_estimation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
depth_estimation-0.1.1.tar.gz -
Subject digest:
f1094c7997a9ca304d4a5b9f4798a4c49dfddd7808037ef6a68fbbb41da73167 - Sigstore transparency entry: 1186325976
- Sigstore integration time:
-
Permalink:
shriarul5273/depth_estimation@158e60580c5050155a656f89bf9c5c65a0e31ca9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/shriarul5273
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@158e60580c5050155a656f89bf9c5c65a0e31ca9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file depth_estimation-0.1.1-py3-none-any.whl.
File metadata
- Download URL: depth_estimation-0.1.1-py3-none-any.whl
- Upload date:
- Size: 183.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dde73f47514bbb89d22cb0681b98cdd3585fb4f2cba74173d8fdb34380605b7c
|
|
| MD5 |
9a261e8bc81116a2ffeb781e69ce62ec
|
|
| BLAKE2b-256 |
1fd0f476cb99324bef9b55d2a1977730ae698496f5fbe619dea89deadcab3920
|
Provenance
The following attestation bundles were made for depth_estimation-0.1.1-py3-none-any.whl:
Publisher:
python-publish.yml on shriarul5273/depth_estimation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
depth_estimation-0.1.1-py3-none-any.whl -
Subject digest:
dde73f47514bbb89d22cb0681b98cdd3585fb4f2cba74173d8fdb34380605b7c - Sigstore transparency entry: 1186325979
- Sigstore integration time:
-
Permalink:
shriarul5273/depth_estimation@158e60580c5050155a656f89bf9c5c65a0e31ca9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/shriarul5273
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@158e60580c5050155a656f89bf9c5c65a0e31ca9 -
Trigger Event:
release
-
Statement type: