Skip to main content

MESS – Multi-domain Evaluation of Semantic Segmentation

Project description

MESS – Multi-domain Evaluation of Semantic Segmentation

This is the official toolkit for the MESS benchmark from the NeurIPS 2023 paper "What a MESS: Multi-domain Evaluation of Zero-shot Semantic Segmentation". Please visit our website or paper for more details.

The MESS benchmark enables a holistic evaluation of semantic segmentation models on a variety of domains and datasets. The MESS benchmark includes 22 datasets for different domains like medicine, engineering, earth monitoring, biology, and agriculture. We designed this toolkit to be easy to use for new model architectures. We invite others to propose new ideas and datasets for future versions.

The website includes a leaderboard with all evaluated models and links to their implementations.

Usage

To test a new model architecture, install the benchmark with pip install mess-benchmark, and follow the steps in DATASETS.md for downloading and preparing the datasets. You can register all datasets by running import mess.datasets. See GettingStarted.md for more details.

Zero-shot semantic segmentation

The current version of the MESS benchmark focuses on zero-shot semantic segmentation, and the toolkit is ready to use for this setting.

Few-shot and many-shot semantic segmentation

Few-shot and many-shot semantic segmentation is not yet supported by the toolkit, but can easily be added based on the provided preprocessing scripts. Most datasets provide a train/val split that can be used for few-shot or supervised training. CHASE DB1 and CryoNuSeg do not provide train data themselves, but use other similar datasets for training (DRIVE and STARE for CHASE DB1 and MoNuSeg for CryoNuSeg). BDD100K, Dark Zurich, iSAID, and UAVid are evaluated using their official validation split. Hence, supervised training may require the train set to be split into a train and val dev split.

The DRAM dataset only provides an unlabelled train set and would require a style transfer to Pascal VOC for labelled training data. The WorldFloods train set requires approximately 300Gb of disk space, which may not be feasible for some users. Therefore, we propose to exclude DRAM and WorldFloods from the few-shot and many-shot settings to simplify the evaluation, called MESS-20.

License

This code is released under the MIT License. The evaluated datasets are released under their respective licenses, see DATASETS.md for details. Most datasets are limited to non-commercial use only and require a citation which are provided in datasets.bib.

Acknowledgement

We would like to acknowledge the work of the dataset providers, especially for the careful collection and annotation of the datasets. Thank you for making the dataset publicly available! See DATASETS.md for more details and links to the datasets. We like to further thank the authors of the evaluated models for their work and providing the model weights.

Citation

Please cite our paper if you use the MESS benchmark and send us your results to be included in the leaderboard.

@article{MESSBenchmark2023,
  title={{What a MESS: Multi-Domain Evaluation of Zero-shot Semantic Segmentation}},
  author={Blumenstiel, Benedikt and Jakubik, Johannes and Kühne, Hilde and Vössing, Michael},
  journal={Advances in Neural Information Processing Systems},
  year={2023}
}

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mess_benchmark-0.2.tar.gz (85.1 kB view details)

Uploaded Source

Built Distribution

mess_benchmark-0.2-py3-none-any.whl (110.7 kB view details)

Uploaded Python 3

File details

Details for the file mess_benchmark-0.2.tar.gz.

File metadata

  • Download URL: mess_benchmark-0.2.tar.gz
  • Upload date:
  • Size: 85.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for mess_benchmark-0.2.tar.gz
Algorithm Hash digest
SHA256 0d36f8a13ba34a6f3aae03802b0a1af96203ccc97adaae50c1b8e2a7843f1a52
MD5 dcdb223bf402b1dd634bd38dddbcf98a
BLAKE2b-256 610a2e3fe608e3a6b486d40d820f7ce9a51c2a71615f051265dd4315f75b003f

See more details on using hashes here.

File details

Details for the file mess_benchmark-0.2-py3-none-any.whl.

File metadata

  • Download URL: mess_benchmark-0.2-py3-none-any.whl
  • Upload date:
  • Size: 110.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for mess_benchmark-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe485da305a033505f726236b8429fac8d6319b3e268b81d885637ef82fb2cb
MD5 e25e453c3378d696c70779f8583f63df
BLAKE2b-256 d07ba4fcb25d612913cfae703d14301113bf517be8fc83a72c0a66a0a6505902

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page