Skip to main content

[Robust-Minisets] A collection of low resolution robustness and generalization benchmarks for image classification

Project description

Robust-Minisets

arxiv.org cite-bibtex data

We introduce Robust-Minisets, a collection of robust benchmark classification datasets in the low resolution realm based on well-established image classification benchmarks, such as CIFAR, Tiny ImageNet, EuroSAT and the MedMNIST collection. We port existing robustness and generalization benchmarks (ImageNet-C, -R, -A and v2) to the small dataset domain introducing novel benchmarks to comprehensively evaluate the robustness and generalization capabilities of image classification models on low resolution datsets. This results in an extensive collection consisting of already existing test sets (e.g. CIFAR-10.1 and Tiny ImageNet-C) as well as the novel benchmarks EuroSAT-C, MedMNIST-C, and Tiny ImageNet-A, -R and -v2 introduced in our ICPR2024 paper "GenFormer - Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets".

Sven Oehri, Nikolas Ebert, Ahmed Abdullah, Didier Stricker & Oliver Wasenmüller
CeMOS - Research and Transfer Center, University of Applied Sciences Mannheim

 
 

Code Structure

  • robust_minisets/:
    • dataset.py: PyTorch datasets and dataloaders of Robust-Minisets.
    • info.py: Dataset information dict for each subset of Robust-Minisets.
  • examples/:
    • getting_started.ipynb: To explore the Robust-Minisets dataset collection with jupyter notebook. It is ONLY intended for a quick exploration, i.e., it does not provide full training and evaluation functionalities.
    • getting_started_without_PyTorch.ipynb: This notebook provides snippets about how to use Robust-Minisets data (the .npz files) without PyTorch.
  • setup.py: To install robust_minisets as a module

Installation and Requirements

Setup the required environments and install robust-minisets as a standard Python package from PyPI:

pip install -r requirements.txt
pip install robust-minisets

Or install from source:

pip install -r requirements.txt
pip install --upgrade git+https://github.com/CeMOS-IS/Robust-Minisets.git

Check whether you have installed the latest code version:

>>> import robust_minisets
>>> print(robust_minisets.__version__)

The code requires only common Python environments for machine learning. Basically, it was tested with

  • Python 3 (>=3.8)
  • torch, torchvision, numpy, Pillow, scikit-learn, scikit-image, tqdm, fire

Higher (or lower) versions should also work (perhaps with minor modifications).

Quick Start

To use a standard test set utilizing the downloaded files:

>>> from robust_minisets import TinyImageNetR
>>> test_dataset = TinyImageNetR(split="test")

To enable automatic downloading by setting download=True:

>>> from robust_minisets import BreastMNISTC
>>> val_dataset = BreastMNISTC(split="val", download=True)

Certain datasets (Tiny ImageNet, EuroSAT) are implemented as training datasets as well:

>>> from robust_minisets import EuroSAT
>>> train_dataset = EuroSAT(split="train", download=True)

If you use PyTorch...

  • Great! Our code is designed to work with PyTorch.

  • Explore the Robust-Minisets dataset with jupyter notebook (getting_started.ipynb), and train basic neural networks in PyTorch.

If you do not use PyTorch...

  • Although our code is tested with PyTorch, you are free to parse them with your own code (without PyTorch or even without Python!), as they are only standard NumPy serialization files. It is simple to create a dataset without PyTorch.
  • Go to getting_started_without_PyTorch.ipynb, which provides snippets about how to use Robust-Minisets data (the .npz files) without PyTorch.
  • Simply change the super class of Robust-Minisets from torch.utils.data.Dataset to collections.Sequence, you will get a standard dataset without PyTorch. Check dataset_without_pytorch.py for more details.
  • You still have most functionality of our Robust-Minisets code ;)

Dataset

Please download the dataset(s) via Zenodo. You could also use our code to download automatically by setting download=True in dataset.py.

The Robust-Minisets collection contains several (mostly) test datasets. Each dataset (e.g., tiny-imagenet-r.npz) is comprised of up to 6 keys: train_images, train_labels, val_images, val_labels, test_images and test_labels.

  • train_images / val_images / test_images: N × W × H × 3. N denotes the number of samples, W and H denote the width and height.
  • train_labels / val_labels / test_labels: N × 1. N denotes the number of samples.

Following we provide a little overview on the datasets in Robust-Minisets:

  • CIFAR-10.1
  • CIFAR-10-C
  • CIFAR-100-C
  • EuroSAT
  • EuroSAT-C
  • MedMNIST-C
    • BreastMNIST-C
    • BloodMNIST-C
    • DermaMNIST-C
    • OCTMNIST-C
    • OrganAMNIST-C
    • OrganCMNIST-C
    • OrganSMNIST-C
    • PathMNIST-C
    • PneumoniaMNIST-C
    • TissueMNIST-C
  • Tiny ImageNet
  • Tiny ImageNet-A
  • Tiny ImageNet-C
  • Tiny ImageNet-R
  • Tiny ImageNetv2

Here we provide a detailed summary to all datasets of the Robust-Minisets collection.

Corruption Details

In this section we provide details about the structure of the corrupted (-C) datasets in the Robust-Minisetscollection. In case you are interested in a detailed evaluation per corruption and/or severity level, the images in the datasets follow the same structure:

  • Each dataset is of shape N $\cdot$ C $\cdot$ S × W × H × 3, where N denotes the number of test samples, C denotes the number of corruptions, and S denotes the number of severity levels (S=5).
  • The images are ordered corruption by corruption and for each corruption from severity level 1 to 5
  • The order of corruptions for each dataset and split can be found here or via the info attribute of each dataset (e.g. TinyImageNetR.info["corruption_dict"])

Command Line Tools

  • List all available datasets:

      python -m robust_minisets available
    
  • Download all available datasets:

      python -m robust_minisets download
    
  • Delete all downloaded npz from root:

      python -m robust_minisets clean
    
  • Print the dataset details given a dataset flag:

      python -m robust_minisets info --flag=<dataset_flag>
    
  • Save the dataset as standard figure and csv files, which could be used for AutoML tools, e.g., Google AutoML Vision:

      python -m robust_minisets save --flag=<dataset_flag> --folder=tmp/ --postfix=png --download=True
    

    By default, download=False.

License

The code is under Apache-2.0 License.

The publication licenses of the datasets can be found within the info dictionary via robust_minisets.INFO[<dataset_flag>].

Acknowledgements

This research was partly funded by Albert and Anneliese Konanz Foundation, the German Research Foundation under grant INST874/9-1 and the Federal Ministry of Education and Research Germany in the project M2Aind-DeepLearning (13FH8I08IA).

Citing

If you find this work useful, please consider citing us:

@inproceedings{oehri2024genformer,
    title = {GenFormer – Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets},
    author = {Oehri, Sven and Ebert, Nikolas and Abdullah, Ahmed and Stricker, Didier and Wasenm{\"u}ller, Oliver},
    booktitle = {International Conference on Pattern Recognition (ICPR)},
    year = {2024},
}

DISCLAIMER: Robust-Minisets is based on a wide range of existing datasets and benchmarks. Thus, please also cite source data paper(s) of the Robust-Miniset subset(s):

Release versions

  • v1.0.0: Robust-Minisets beta release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robust-minisets-1.0.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robust_minisets-1.0.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file robust-minisets-1.0.0.tar.gz.

File metadata

  • Download URL: robust-minisets-1.0.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for robust-minisets-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f7a4117dfbd2918372cf8080946ea453db5a0629447f93a959047bab196c2327
MD5 34db799b92674e9be0932f490bdc8655
BLAKE2b-256 23e94c72fe314a168e95c0f67f356024878810358b3764707749d578c6f52bdf

See more details on using hashes here.

File details

Details for the file robust_minisets-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for robust_minisets-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1a57f7f763a89bb27c344366f2ce46229c83d0d2cf45a48a520d63e04de0a91
MD5 81e54bfa60bad1b044d448d4ca1a2349
BLAKE2b-256 2b95e202fbd249c993855e525d968a73b9393690649513957ddc5b4b5b83265c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page