Skip to main content

DeFM: Learning Foundation Representations from Depth for Robotics

Project description

DeFM: Learning Foundation Representations from Depth for Robotics

Python 3.8+ PyTorch 2.4+ uv Managed Hugging Face License: Apache 2.0 Arxiv Webpage GitHub


DeFM (Depth Foundation Model) is a vision backbone trained on 60M depth images via self-distillation. It is engineered for robotic perception, providing metric-aware representations that excel in sim-to-real transfer and cross-sensor generalization.

TL;DR - A DINO-style encoder, but for depth image inputs.

🌟 Key Features

  • Large-Scale Pretraining: We pretrain on our curated dataset of 60 M depth images using self-distillation.
  • Semantic Awareness: DeFM learns not only robust geometric priors but also semantically rich features from just depth images.
  • Metric-Aware Normalization: Our novel three channel input normalization preserves metric depth across multiple scales.
  • Compact efficient models: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
  • Robotics Proven: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.

🛠️ Installation

To use DeFM as a backbone in your own projects without cloning this repository, ensure you have the following prerequisites installed:

pip install torch torchvision numpy huggingface_hub omegaconf

and directly jump to Quick Start. Otherwise, below you can find instructions to install the DeFM module for using the notebooks or for local development.

Using uv (Fastest)

# Create and activate environment
uv venv --python 3.10
source .venv/bin/activate

# Install dependencies and DeFM in editable mode
uv pip install -e .

Using standard pip

pip install -e .

🚀 Quick Start

1. Loading the Model

Load via TorchHub for easy integration:

import torch

# Load the 307M Parameter Foundation Model
model = torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14', pretrained=True)
model.eval().to("cuda")

2. Preprocessing

DeFM requires depth maps to be processed into our metric-aware 3-channel format.

from defm import preprocess_depth_image

# Depth needs to be in meters (numpy array, tensor or PIL image)
normalized_depth = preprocess_depth_image(metric_depth, target_size=518, patch_size=14)

3. Inference

with torch.no_grad():
    output = model.get_intermediate_layers(
        normalized_depth, n=1, reshape=True, return_class_token=True)

spatial_tokens = output[0][0] # (B, C, H', W')
class_token = output[0][1] # (B, C)

📂 Project Structure

  • defm/: Main package containing model factory, architectures, and utils.
  • notebooks/: Demo notebooks for Semantic PCA Visualization and Inference Scripts for CNNs and ViTs.
  • classification/: Scripts and packages to reproduce the classification results [TODO]
  • segmentation/: Scripts and packages to reproduce the segmentation results [TODO]

📊 Model Zoo

The following table provides a comprehensive overview of the DeFM model family, including architectural parameters, inference latency across training and deployment hardware (224x224), and performance on the ImageNet-1k-Depth benchmark.

Model Params (M) RTX 4090 (ms) Jetson Orin (ms) Top-5 KNN (%) Linear Prob (%) Checkpoint
DeFM ViT-L/14 307.0 624.91 72.82 84.79 71.72 Download
DeFM ViT-S/14 22.1 63.76 11.92 78.06 61.54 Download
DeFM ResNet-50 26.2 69.39 17.79 77.63 61.54 Download
DeFM ResNet-34 21.8 33.08 13.54 72.72 54.39 Download
DeFM ResNet-18 11.7 21.06 8.67 69.69 50.58 Download
DeFM EfficientNet-B6 28.98 150.98 54.11 77.81 59.23 Download
DeFM EfficientNet-B4 14.16 86.51 39.67 74.74 54.73 Download
DeFM EfficientNet-B2 4.95 46.12 28.37 71.51 50.32 Download
DeFM EfficientNet-B0 3.01 29.39 21.04 67.98 46.17 Download
DeFM RegNetY-1.6GF 12.4 44.25 41.82 76.21 57.28 Download
DeFM RegNetY-800MF 6.3 25.21 24.16 74.91 57.03 Download
DeFM RegNetY-400MF 4.1 17.27 25.17 72.87 50.51 Download

📖 Citation

If you find DeFM useful for your research, please cite our paper:

@misc{patel2026defm,
      title={DeFM: Learning Foundation Representations from Depth for Robotics}, 
      author={Manthan Patel and Jonas Frey and Mayank Mittal and Fan Yang and Alexander Hansson and Amir Bar and Cesar Cadena and Marco Hutter},
      year={2026},
      eprint={2601.18923},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2601.18923}, 
}

Contribution Guidelines

We use the following tools for maintaining code quality:

  • pre-commit: Runs a list of formatters and linters over the codebase.
  • ruff: An extremely fast Python linter and code formatter, written in Rust.

Please check here for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:

# for installation (only once)
pre-commit install
# for running
pre-commit run --all-files

Acknowledgement

We would like to thank the researchers and developers of the DINOv2 repository for their excellent open-source work, which served as the foundation for our implementation.

This work was supported as part of the Swiss AI Initiative by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID a144 on Alps. This work was also supported by the Luxembourg National Research Fund (Ref. 18990533), and the Swiss National Science Foundation (SNSF) through projects No. 200021E_229503 and No. 227617.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

defm-1.0.1.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

defm-1.0.1-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file defm-1.0.1.tar.gz.

File metadata

  • Download URL: defm-1.0.1.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for defm-1.0.1.tar.gz
Algorithm Hash digest
SHA256 023f8637c5d1a384318811e891a56fcb5dcd3fec05b7e00b70ea093a67617803
MD5 4c5e8040333f631872194c6c7d3967e1
BLAKE2b-256 5d94d520781b7dad56eaa41e9d969e01ffa0c2142b302b9ca07a2c38d60dc7e7

See more details on using hashes here.

File details

Details for the file defm-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: defm-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for defm-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cfebcbd226584bda4f9e45d311b1b24897d4e6cfc6d61b51eb7887960d8308d
MD5 86c4ff3bae842cc7d30e21f1fbf790cf
BLAKE2b-256 ac131621ea5fc99d2b19fdbdec0ca31883ba651d7f777178dca2d4238367f8ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page