Skip to main content

Depth map evaluation toolkit with comprehensive metrics

Project description

euler-eval

A comprehensive evaluation toolkit for comparing predicted depth maps, RGB images, and camera ray direction maps against ground truth, powered by euler_loading for flexible dataset loading.

Features

  • Depth metrics: PSNR, SSIM, LPIPS, FID, KID, AbsRel, RMSE, Scale-Invariant Log Error, Normal Consistency, Depth Edge F1
  • RGB metrics: PSNR, SSIM, LPIPS, FID, SCE (Structural Chromatic Error), Edge F1, Tail Errors (p95/p99), High-Frequency Energy Ratio, Depth-Binned Photometric Error
  • Rays metrics: ρ_A (AUC of angular accuracy curve), Angular Error statistics and threshold percentages
  • Benchmark binning: Optional depth-range benchmark that subdivides metrics into log-scaled near/mid/far bins
  • Sanity checking: Automatic validation of metric results against configurable thresholds, with detailed warning reports
  • Sky masking: Optional exclusion of sky regions from metrics using GT segmentation
  • Flexible dataset loading: Automatic loader resolution via euler_loading and ds-crawler index metadata
  • Per-file and aggregate results: Outputs both per-image metrics and dataset-level aggregates to JSON, saved per-modality
  • euler_train integration: Optional experiment logging via euler_train

Installation

Requires Python 3.9+.

uv pip install "euler-eval @ git+https://github.com/d-rothen/euler-parser.git"

# with euler_train logging support
uv pip install "euler-eval[logging] @ git+https://github.com/d-rothen/euler-parser.git"

# with clean-fid RGB FID backend support
uv pip install "euler-eval[fid] @ git+https://github.com/d-rothen/euler-parser.git"

Or install in editable mode:

pip install -e .

Dependencies

Core:

Optional:

Usage

The package provides a depth-eval console script:

depth-eval <config> [options]

It also provides a cache warmup helper for offline environments:

euler-eval.init

Or run directly:

python main.py <config> [options]

Before running on offline compute nodes, you can warm caches on a machine with network access:

HF_HOME=/shared/cache/hf \
TORCH_HOME=/shared/cache/hf/torch \
CLEANFID_CACHE_DIR=/shared/cache/clean-fid \
euler-eval.init

This pre-downloads:

  • torchvision AlexNet weights
  • torchvision Inception v3 weights
  • LPIPS AlexNet weights
  • the clean-fid inception checkpoint, if clean-fid is installed

Positional arguments

Argument Description
config Path to a JSON configuration file (see Configuration)

Options

Flag Type Default Description
--device {auto,cuda,cpu} auto Compute device (auto prefers CUDA when available)
--batch-size int 16 Batch size for metrics that support batching
--num-workers int 4 Number of data loading workers
--verbose, -v flag off Enable verbose output
--skip-depth flag off Skip depth evaluation
--skip-rgb flag off Skip RGB evaluation
--skip-rays flag off Skip rays (spherical direction map) evaluation
--mask-sky flag off Mask sky regions from metrics using GT segmentation
--no-sanity-check flag off Disable sanity checking of metric configurations
--metrics-config str auto-detect Path to metrics_config.json for sanity checking
--depth-alignment {none,auto_affine,affine} auto_affine Depth calibration mode; outputs are emitted in semantic native/metric spaces and depth aliases the canonical branch
--rgb-fid-backend {builtin,clean-fid} builtin RGB FID backend; clean-fid requires optional dependency
--benchmark-depth-range float float none Depth range [MIN, MAX] in meters for benchmark evaluation; computes depth and RGB metrics for pixels within this range, subdivided into log-scaled near/mid/far bins (additive to regular metrics)

Examples

# Evaluate with default settings (auto-selects CUDA when available)
depth-eval config.json --batch-size 32

# Evaluate with sky masking enabled (requires gt.segmentation in config)
depth-eval config.json --mask-sky -v

# Skip RGB evaluation, only evaluate depth
depth-eval config.json --skip-rgb

# Disable sanity checking
depth-eval config.json --no-sanity-check

# Disable depth alignment
depth-eval config.json --depth-alignment none

# Force affine scale+shift alignment on all depth predictions
depth-eval config.json --depth-alignment affine

# Use clean-fid for RGB FID computation
depth-eval config.json --rgb-fid-backend clean-fid

# Benchmark depth and RGB metrics within a depth range (near/mid/far bins)
depth-eval config.json --benchmark-depth-range 0.5 80.0

# Evaluate dense depth predictions against sparse pointcloud GT
depth-eval example_sparse_depth_config.json --skip-rgb --skip-rays

# Skip rays evaluation
depth-eval config.json --skip-rays

Configuration

config.json

Defines GT modalities, prediction datasets to evaluate, and optional euler_train logging. See example_config.json. For sparse pointcloud depth GT, see example_sparse_depth_config.json.

{
  "euler_train": {
    "dir": "runs/my_project"
  },
  "gt": {
    "rgb":          { "path": "/data/gt/rgb" },
    "depth":        { "path": "/data/gt/depth" },
    "rays":         { "path": "/data/gt/rays" },
    "segmentation": { "path": "/data/gt/segmentation" },
    "calibration":  { "path": "/data/gt/calibration" }
  },
  "datasets": [
    {
      "name": "model_a",
      "rgb":   { "path": "/data/model_a/rgb" },
      "depth": { "path": "/data/model_a/depth" },
      "rays":  { "path": "/data/model_a/rays" },
      "output_file": "/path/to/output/model_a_eval.json"
    },
    {
      "name": "model_b_depth_only",
      "depth": { "path": "/data/model_b/depth" }
    },
    {
      "name": "model_c_rgb_only",
      "rgb": { "path": "/data/model_c/rgb" }
    }
  ]
}

Each modality entry can optionally include a split field to select a specific split from the dataset (e.g. { "path": "/data/gt/depth", "split": "test" }).

For sparse pointcloud depth evaluation, use gt.sparse_depth instead of gt.depth. The prediction still uses datasets[].depth because the model output is a dense depth map. The evaluator projects the sparse GT point cloud into the prediction image plane using gt.intrinsics and gt.camera_extrinsics, then computes pointwise depth metrics only at projected valid pixels.

GT section

Field Required Description
gt.rgb.path no* Path to GT RGB dataset
gt.depth.path no* Path to GT depth dataset
gt.sparse_depth.path no* Path to sparse pointcloud GT dataset, e.g. sparse_depth with (N,C) points whose first columns are x,y,z in meters
gt.rays.path no* Path to GT ray direction map dataset (for rays evaluation)
gt.segmentation.path no Path to GT segmentation (needed for --mask-sky)
gt.calibration.path no Path to calibration data (camera intrinsics matrices)
gt.intrinsics.path required with gt.sparse_depth Path to camera intrinsics matrices for pointcloud projection
gt.camera_extrinsics.path required with gt.sparse_depth Path to source-to-camera extrinsics, e.g. MUSES lidar2rgb, for pointcloud projection
gt.name no Display name for ground truth (default: "GT")

* At least one of gt.rgb.path, gt.depth.path, gt.sparse_depth.path, or gt.rays.path is required.

Prediction datasets

Each entry in datasets can include rgb, depth, rays, or any combination:

Field Required Description
name yes Display name for this prediction dataset
rgb.path no* Path to predicted RGB dataset
depth.path no* Path to predicted dense depth dataset; also used when evaluating against sparse pointcloud GT
rays.path no* Path to predicted ray direction map dataset
output_file no Custom output path for results JSON (default: eval.json inside the first available modality path)

* At least one of rgb.path, depth.path, or rays.path is required.

euler_train section (optional)

When present, evaluation results are logged to an euler_train run. Requires the euler-train package to be installed (pip install euler-eval[logging]).

Field Required Description
euler_train.dir yes Project directory (creates a new run) or full path to an existing run directory (resumes it)

euler_train auto-detects whether the path is a run directory by checking for meta.json. When resuming an existing run, the run is detached after evaluation (the run remains active for further use). When a new run is created, it is finished upon completion.

Loader resolution

Loaders are resolved automatically by euler_loading from each dataset directory's ds-crawler index metadata. The index's euler_loading.loader and euler_loading.function fields determine which loader module and function to use (e.g. "vkitti2" maps to euler_loading.loaders.gpu.vkitti2).

No manual loader selection is required. Each dataset directory declares its own loader through its ds-crawler configuration.

Dataset metadata (e.g. radial_depth, rgb_range, sparse point columns, and coordinate units) is read automatically from each dataset's ds-crawler metadata via get_modality_metadata(). Depth and point-cloud coordinates are assumed to already be in meters.

Dataset Metadata

Each dataset directory must contain ds-crawler metadata artifacts, typically generated under .ds_crawler/:

dataset-root/
  .ds_crawler/
    dataset-head.json
    ds-crawler.json
    index.json

GT and prediction datasets are matched by hierarchy path and file ID through MultiModalDataset.

metrics_config.json

Controls sanity check thresholds. See metrics_config.json for all available options. When --metrics-config is not specified, the tool auto-detects metrics_config.json at the project root. If not found, built-in defaults are used.

Metrics

Depth metrics

Metric Key Description
PSNR depth.image_quality.psnr Peak Signal-to-Noise Ratio (dB), using max depth as dynamic range
SSIM depth.image_quality.ssim Structural Similarity Index
LPIPS depth.image_quality.lpips Learned Perceptual Image Patch Similarity
FID depth.image_quality.fid Fréchet Inception Distance (dataset-level distribution metric)
KID depth.image_quality.kid_mean, kid_std Kernel Inception Distance (mean and std)
Standard depth metrics depth.standard.{image_mean,image_median,pixel_pool}.* Monocular-depth metrics with explicit reducers: absrel, sqrel, mae, rmse, rmse_log, log10, silog, delta1, delta2, delta3
AbsRel depth.depth_metrics.absrel Absolute Relative Error (|pred-gt|/gt), reported as median and p90
RMSE depth.depth_metrics.rmse Root Mean Square Error, reported as median and p90
SILog depth.depth_metrics.silog Scale-Invariant Log Error, reported as mean, median, and p90
Normal Consistency depth.geometric_metrics.normal_consistency Surface normal angular error (degrees) via finite differences; includes mean, median, and percent below 11.25°/22.5°/30°
Depth Edge F1 depth.geometric_metrics.depth_edge_f1 Edge detection precision/recall/F1 for depth discontinuities

Sparse Depth Metrics

Sparse pointcloud GT evaluation reports only metrics that remain meaningful at isolated projected points. Dense image-quality and geometric metrics such as SSIM, LPIPS, FID, normals, and edge F1 are intentionally skipped.

Metric Key Description
Standard depth metrics sparse_depth.standard.{image_mean,image_median,pixel_pool}.* Same monocular-depth reducers as dense depth, computed only at projected sparse GT pixels
AbsRel sparse_depth.depth_metrics.absrel Absolute Relative Error (|pred-gt|/gt), reported as median and p90 over projected sparse pixels
RMSE sparse_depth.depth_metrics.rmse Root Mean Square Error, reported as median and p90 over projected sparse pixels
SILog sparse_depth.depth_metrics.silog Scale-Invariant Log Error, reported as mean, median, and p90 over projected sparse pixels

RGB metrics

Metric Key Description
PSNR rgb.image_quality.psnr Peak Signal-to-Noise Ratio (dB)
SSIM rgb.image_quality.ssim Structural Similarity Index
SCE rgb.image_quality.sce Structural Chromatic Error
LPIPS rgb.image_quality.lpips Learned Perceptual Image Patch Similarity
FID rgb.image_quality.fid Fréchet Inception Distance (dataset-level distribution metric)
Edge F1 rgb.edge_f1 Edge preservation precision/recall/F1
Tail Errors rgb.tail_errors 95th and 99th percentile per-pixel errors
High-Frequency Energy rgb.high_frequency HF energy preservation ratio (pred vs GT) and relative difference
Depth-Binned Photometric Error rgb.depth_binned_photometric MAE/MSE in near/mid/far depth bins (requires GT depth)

Rays metrics

Metric Key Description
ρ_A rays.rho_a.mean, rho_a.median Area Under the angular accuracy Curve — fraction of pixels with angular error ≤ threshold, integrated from 0 to a FoV-dependent threshold (S.FoV: 15°, L.FoV: 20°, Pano: 30°)
Angular Error rays.angular_error.mean_angle, median_angle Per-pixel angular error between predicted and GT camera ray directions (degrees)
Angular Error Thresholds rays.angular_error.percent_below_* Percentage of pixels with angular error below 5°, 10°, 15°, 20°, 30°

Output

Results are saved as JSON per modality per prediction dataset (one file for depth or sparse depth, one for RGB, one for rays). Default path: eval.json inside each modality's dataset path, unless overridden by output_file in the config.

For RGB FID, two backends are available:

  • builtin: in-process Inception-based implementation in this repository.
  • clean-fid: delegates folder-vs-folder FID computation to clean-fid. This backend requires installing the optional fid extra and is recommended when you need scores closer to standard published FID numbers.

When --rgb-fid-backend clean-fid is used, euler-eval will honor CLEANFID_CACHE_DIR if set:

  • If CLEANFID_CACHE_DIR/inception-2015-12-05.pt exists, it is staged into the location clean-fid expects before evaluation.
  • If it does not exist and the machine is online, euler-eval asks clean-fid to download it into CLEANFID_CACHE_DIR.
  • Without CLEANFID_CACHE_DIR, clean-fid falls back to its own default local path handling.

Output structure

{
  "depth_native": { "...": "native model depth space, if diagnostically meaningful" },
  "depth_metric": { "...": "metric depth space, if available" },
  "depth": {
    "...": "canonical alias of depth_metric when present, else depth_native"
  },
  "sparse_depth": {
    "...": "canonical sparse-depth alias when gt.sparse_depth is used"
  },
  "rgb": {
    "...": "..."
  },
  "per_file_metrics": {
    "children": {
      "scene_01": {
        "children": {
          "camera_0": {
            "files": [
              {
                "id": "frame_0001",
                "metrics": {
                  "depth": { "...": "canonical alias" },
                  "depth_native": { "...": "native, when emitted" },
                  "depth_metric": { "...": "metric, when emitted" },
                  "sparse_depth": { "...": "canonical sparse-depth alias" },
                  "rgb": { "...": "..." }
                }
              }
            ]
          }
        }
      }
    }
  }
}

For depth outputs:

  • depth_native: the model's native depth space after spatial/radial preprocessing, emitted only when it is diagnostically distinct.
  • depth_metric: the comparable metric-depth branch. This is either the native prediction itself or the calibrated scale-shift result.
  • depth: canonical alias of depth_metric when available, otherwise depth_native.
  • standard: explicit monocular-depth metrics with three reducers: image_mean, image_median, and pixel_pool.

For sparse depth outputs, the same native/metric/canonical pattern is emitted under sparse_depth_native, sparse_depth_metric, and sparse_depth, but only standard and depth_metrics categories are present.

Previous single-depth structure (kept under depth) is:

{
  "depth": {
    "image_quality": {
      "psnr": 28.5,
      "ssim": 0.92,
      "lpips": 0.08,
      "fid": 12.3,
      "kid_mean": 0.005,
      "kid_std": 0.002
    },
    "standard": {
      "image_mean": {
        "absrel": 0.08,
        "sqrel": 0.04,
        "mae": 0.62,
        "rmse": 1.20,
        "rmse_log": 0.11,
        "log10": 0.04,
        "silog": 0.08,
        "delta1": 0.91,
        "delta2": 0.97,
        "delta3": 0.99
      },
      "image_median": {
        "absrel": 0.07,
        "sqrel": 0.03,
        "mae": 0.58,
        "rmse": 1.10,
        "rmse_log": 0.10,
        "log10": 0.04,
        "silog": 0.07,
        "delta1": 0.92,
        "delta2": 0.98,
        "delta3": 0.99
      },
      "pixel_pool": {
        "absrel": 0.08,
        "sqrel": 0.04,
        "mae": 0.61,
        "rmse": 1.18,
        "rmse_log": 0.11,
        "log10": 0.04,
        "silog": 0.08,
        "delta1": 0.91,
        "delta2": 0.97,
        "delta3": 0.99
      }
    },
    "depth_metrics": {
      "absrel": { "median": 0.05, "p90": 0.12 },
      "rmse":   { "median": 1.2, "p90": 3.1 },
      "silog":  { "mean": 0.08, "median": 0.06, "p90": 0.15 }
    },
    "geometric_metrics": {
      "normal_consistency": {
        "mean_angle": 12.3,
        "median_angle": 9.8,
        "percent_below_11_25": 55.2,
        "percent_below_22_5": 82.1,
        "percent_below_30": 91.5
      },
      "depth_edge_f1": {
        "precision": 0.72,
        "recall": 0.68,
        "f1": 0.70
      }
    },
    "dataset_info": {
      "num_pairs": 500,
      "gt_name": "GT",
      "pred_name": "model_a"
    }
  },
  "rgb": { "...": "unchanged" }
}

Sanity check report

When sanity checking is enabled (the default), a sanity_check_report.json is saved to the current working directory containing warnings grouped by metric type.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

euler_eval-2.7.0.tar.gz (117.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

euler_eval-2.7.0-py3-none-any.whl (100.9 kB view details)

Uploaded Python 3

File details

Details for the file euler_eval-2.7.0.tar.gz.

File metadata

  • Download URL: euler_eval-2.7.0.tar.gz
  • Upload date:
  • Size: 117.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for euler_eval-2.7.0.tar.gz
Algorithm Hash digest
SHA256 98a527bddba47adbcb8a714a13d7e5e5da96e391d0303c3ed0d6ec07bb35d8ae
MD5 5b09a109e2378149c9e855682804a3d2
BLAKE2b-256 f1240f7580f8882375541265d49b055a680cdb24a35585d18b6f90d0aa3c1922

See more details on using hashes here.

File details

Details for the file euler_eval-2.7.0-py3-none-any.whl.

File metadata

  • Download URL: euler_eval-2.7.0-py3-none-any.whl
  • Upload date:
  • Size: 100.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for euler_eval-2.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3992b5b05db66a16950f68b91542aa725a73ef70776a9e4ec4fbf4eacc1b87d
MD5 2579995de88d53eb73159f8d5a599c4a
BLAKE2b-256 1c4fbf52e7a8303e86c41875d74a00a0faee9e334ea7cdd97ca384551a819747

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page