The honest, geometrically-correct, DICOM-native evaluation harness for medical image segmentation.
Project description
segauge
DICOM-native evaluation for medical image segmentation: surface-mesh distance metrics, per-lesion detection, fairness slices, and a confidence interval on every number.
segauge computes Dice, IoU, Hausdorff distance (HD), HD95, average symmetric surface distance (ASSD), Normalized Surface Dice (NSD), and per-lesion detection F1 for 3D medical image segmentation, and puts a bootstrap confidence interval on every number. It reads NIfTI, DICOM-SEG, RTSTRUCT, and NumPy directly, so you can evaluate the output of nnU-Net, MONAI, or TotalSegmentator without a lossy conversion.
It exists because the metrics you report should be the metrics that are true, and today they often aren't.
pip install segauge
import segauge as sg
result = sg.evaluate([
sg.Case("patient_001", pred="pred.nii.gz", gt="gt.nii.gz", metadata={"scanner": "siemens"}),
sg.Case("patient_002", pred="pred2.dcm", gt="gt2.dcm", metadata={"scanner": "ge"}),
])
print(result.summary()) # every metric with a 95% CI
print(result.by_subgroup("scanner")) # where does the model quietly fail?
result.to_html("report.html") # one self-contained report
Or from the command line:
segauge eval --pred preds/ --gt labels/ --metadata cases.csv --report report.html
Why not just use MONAI or Metrics Reloaded?
segauge bundles four things no incumbent provides together: a confidence interval on every metric, per-lesion detection, subgroup/fairness slicing, and native DICOM-SEG/RTSTRUCT input.
It also computes distance metrics on a surface mesh rather than the voxel grid: it extracts the object surface with marching cubes at true voxel spacing and integrates over the surface, following the MeshMetrics method (Podobnik & Vrtovec, 2025). In our own benchmark (benchmarks/mesh_vs_grid.py) this measurably reduces HD95 error under anisotropic, thick-slice spacing, where grid rasterization hurts most; for mean distance (ASSD) on smooth shapes the two methods are comparable. A formal validation suite against the MeshMetrics reference is in progress; we report what the benchmark shows, not more.
| segauge | MONAI | Metrics Reloaded | seg-metrics | DeepMind surface-distance | |
|---|---|---|---|---|---|
| Surface-mesh distances (vs voxel grid) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Confidence intervals | ✅ | ❌ | ❌ | ❌ | ❌ |
| Per-lesion detection F1 | ✅ | partial | ✅ | ❌ | ❌ |
| Subgroup / fairness slicing | ✅ | ❌ | ❌ | ❌ | ❌ |
| DICOM-SEG / RTSTRUCT native | ✅ | partial | ❌ | ❌ | ❌ |
pip install |
✅ | ✅ | ❌ | ✅ | ✅ |
Metrics
- Overlap: Dice (DSC), IoU (Jaccard) — exact, integer-counted.
- Surface distance: Hausdorff (HD), HD95, ASSD, MASD, Normalized Surface Dice (NSD) — mesh-based, spacing-aware, area-weighted.
- Per-lesion detection: precision, recall, F1 via connected-component matching, so you can answer "did it find the tumor?" not just "how much voxel overlap?"
- Every aggregate carries a deterministic bootstrap confidence interval.
Inputs
NIfTI (.nii, .nii.gz), DICOM-SEG, RTSTRUCT, NumPy arrays, and .npy. Voxel spacing is read from the file header and used for the distance metrics, so a segmentation from a clinical pipeline is evaluated as-is.
FAQ
How do I compute HD95 correctly in Python? segauge.surface_metrics(pred, gt, spacing) returns HD, HD95, ASSD, MASD, and NSD computed on the surface mesh, not the voxel grid.
Can it evaluate DICOM-SEG / RTSTRUCT directly? Yes. pip install segauge[dicom] and pass a .dcm SEG, or use segauge.load_rtstruct(series_dir, rtstruct, roi_name).
Does it work with nnU-Net / TotalSegmentator / MONAI outputs? Yes. Point it at the NIfTI (or DICOM) they produce.
Why a confidence interval? A Dice of 0.85 on 12 cases is not the same claim as 0.85 on 1200. segauge makes the difference visible.
2D images? Overlap and detection metrics work in any dimension; surface-distance metrics are 3D in v0.1 (2D is planned for v0.2).
Status
Pre-release (0.1.0.dev), built in the open. segauge is an evaluation tool for developers and researchers. It is not a medical device and produces no diagnosis.
References
- Maier-Hein, Reinke et al. Metrics Reloaded. Nature Methods (2024). https://www.nature.com/articles/s41592-023-02151-z
- Podobnik & Vrtovec. MeshMetrics. arXiv:2509.05670 (2025). https://arxiv.org/abs/2509.05670
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file segauge-0.1.0.tar.gz.
File metadata
- Download URL: segauge-0.1.0.tar.gz
- Upload date:
- Size: 142.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c800f3909fc424875e92bec3d93c1cf9c70638e6c9ddf7792e72cd526db63631
|
|
| MD5 |
64e0f69ee8bfa130c416888bab8d57b9
|
|
| BLAKE2b-256 |
e30186eb02c337fea5ac79622cbd00544ea92ea207e9e377335f04ffde93042d
|
File details
Details for the file segauge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: segauge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28b51f84572205c1765f33ffcdba65bb637e1294ecf3f60644bd424b210a36cb
|
|
| MD5 |
eb3868f2e7f12b48765e16a367555eed
|
|
| BLAKE2b-256 |
e260dcc69ee4fa17f46f9a179b5b987461f4d68a69b2f532d33bf2da643c54c7
|