Skip to main content

A collection of Image Classification and Object Detection task datasets conforming to the MAITE protocol.

Project description

MAITE Datasets

MAITE Datasets are a collection of public datasets wrapped in a MAITE compliant format.

Installation

To install and use maite-datasets you can use pip:

pip install maite-datasets

For status bar indicators when downloading, you can include the extra tqdm when installing:

pip install maite-datasets[tqdm]

Available Downloadable Datasets

Task Dataset Description
Classification CIFAR10 CIFAR10 dataset.
Classification MNIST A dataset of hand-written digits.
Classification Ships A dataset that focuses on identifying ships from satellite images.
Detection AntiUAVDetection A UAV detection dataset in natural images with varying backgrounds.
Detection MILCO A side-scan sonar dataset focused on mine-like object detection.
Detection Seadrone A UAV dataset focused on open water object detection.
Detection VOCDetection Pascal VOC dataset.

Usage

Here is an example of how to import MNIST for usage with your workflow.

>>> from maite_datasets.image_classification import MNIST

>>> mnist = MNIST(root="data", download=True)
>>> print(mnist)
MNIST Dataset
-------------
    Corruption: None
    Transforms: []
    Image_set: train
    Metadata: {'id': 'MNIST_train', 'index2label': {0: 'zero', 1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five', 6: 'six', 7: 'seven', 8: 'eight', 9: 'nine'}, 'split': 'train'}
    Path: /home/user/maite-datasets/data/mnist
    Size: 60000

>>> print("tuple("+", ".join([str(type(t)) for t in mnist[0]])+")")
tuple(<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'dict'>)

Dataset Wrappers

Wrappers provide a way to convert datasets to allow usage of tools within specific backend frameworks.

Torchvision

TorchvisionWrapper is a convenience class that wraps any of the datasets and provides the capability to apply torchvision transforms to the dataset.

NOTE: TorchvisionWrapper requires torch and torchvision to be installed.

>>> from maite_datasets.object_detection import MILCO

>>> milco = MILCO(root="data", download=True)
>>> print(milco)
MILCO Dataset
-------------
    Transforms: []
    Image Set: train
    Metadata: {'id': 'MILCO_train', 'index2label': {0: 'MILCO', 1: 'NOMBO'}, 'split': 'train'}
    Path: /home/user/maite-datasets/data/milco
    Size: 261

>>> print(f"type={milco[0][0].__class__.__name__}, shape={milco[0][0].shape}")
type=ndarray, shape=(3, 1024, 1024)

>>> print(milco[0][1].boxes[0])
[ 75. 217. 130. 247.]

>>> from maite_datasets.wrappers import TorchvisionWrapper
>>> from torchvision.transforms.v2 import Resize

>>> milco_torch = TorchvisionWrapper(milco, transforms=Resize(224))
>>> print(milco_torch)
Torchvision Wrapped MILCO Dataset
---------------------------
    Transforms: Resize(size=[224], interpolation=InterpolationMode.BILINEAR, antialias=True)

MILCO Dataset
-------------
    Transforms: []
    Image Set: train
    Metadata: {'id': 'MILCO_train', 'index2label': {0: 'MILCO', 1: 'NOMBO'}, 'split': 'train'}
    Path: /home/user/maite-datasets/data/milco
    Size: 261

>>> print(f"type={milco_torch[0][0].__class__.__name__}, shape={milco_torch[0][0].shape}")
type=Image, shape=torch.Size([3, 224, 224])

>>> print(milco_torch[0][1].boxes[0])
tensor([16.4062, 47.4688, 28.4375, 54.0312], dtype=torch.float64)

Dataset Adapters

Adapters provide a way to read in datasets from other popular formats.

Huggingface

Hugging face datasets can be adapted into MAITE compliant format using the from_huggingface adapter.

>>> from datasets import load_dataset
>>> from maite_datasets.adapters import from_huggingface

>>> cppe5 = load_dataset("cppe-5")
>>> m_cppe5 = from_huggingface(cppe5["train"])
>>> print(m_cppe5)
HFObjectDetection Dataset
-------------------------
    Source: Dataset({
    features: ['image_id', 'image', 'width', 'height', 'objects'],
    num_rows: 1000
})
    Metadata: {'id': 'cppe-5', 'index2label': {0: 'Coverall', 1: 'Face_Shield', 2: 'Gloves', 3: 'Goggles', 4: 'Mask'}, 'description': '', 'citation': '', 'homepage': '', 'license': '', 'features': {'image_id': Value('int64'), 'image': Image(mode=None, decode=True), 'width': Value('int32'), 'height': Value('int32'), 'objects': {'id': List(Value('int64')), 'area': List(Value('int64')), 'bbox': List(List(Value('float32'), length=4)), 'category': List(ClassLabel(names=['Coverall', 'Face_Shield', 'Gloves', 'Goggles', 'Mask']))}}, 'post_processed': None, 'supervised_keys': None, 'builder_name': 'parquet', 'dataset_name': 'cppe-5', 'config_name': 'default', 'version': 0.0.0, 'splits': {'train': SplitInfo(name='train', num_bytes=240478590, num_examples=1000, shard_lengths=None, dataset_name='cppe-5'), 'test': SplitInfo(name='test', num_bytes=4172706, num_examples=29, shard_lengths=None, dataset_name='cppe-5')}, 'download_checksums': {'hf://datasets/cppe-5@66f6a5efd474e35bd7cb94bf15dea27d4c6ad3f8/data/train-00000-of-00001.parquet': {'num_bytes': 237015519, 'checksum': None}, 'hf://datasets/cppe-5@66f6a5efd474e35bd7cb94bf15dea27d4c6ad3f8/data/test-00000-of-00001.parquet': {'num_bytes': 4137134, 'checksum': None}}, 'download_size': 241152653, 'post_processing_size': None, 'dataset_size': 244651296, 'size_in_bytes': 485803949}

>>> image = m_cppe5[0][0]
>>> print(f"type={image.__class__.__name__}, shape={image.shape}")
type=ndarray, shape=(3, 663, 943)

>>> target = m_cppe5[0][1]
>>> print(f"box={target.boxes[0]}, label={target.labels[0]}")
box=[302.0, 109.0, 73.0, 52.0], label=4

>>> print(m_cppe5[0][2])
{'id': [114, 115, 116, 117], 'image_id': 15, 'width': 943, 'height': 663, 'area': [3796, 1596, 152768, 81002]}

Additional Information

For more information on the MAITE protocol, check out their documentation.

Acknowledgement

CDAO Funding Acknowledgement

This material is based upon work supported by the Chief Digital and Artificial Intelligence Office under Contract No. W519TC-23-9-2033. The views and conclusions contained herein are those of the author(s) and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maite_datasets-0.0.12.tar.gz (92.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maite_datasets-0.0.12-py3-none-any.whl (107.6 kB view details)

Uploaded Python 3

File details

Details for the file maite_datasets-0.0.12.tar.gz.

File metadata

  • Download URL: maite_datasets-0.0.12.tar.gz
  • Upload date:
  • Size: 92.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for maite_datasets-0.0.12.tar.gz
Algorithm Hash digest
SHA256 3801ae8e91a13bba745cb57f58c297363eb154a87b30634dc9b952ed85bd1ede
MD5 b45bda6b11b1de0026fac0c9db4beeb5
BLAKE2b-256 1bf300b4b2647e7a56d4a6bc64c2d76f140341221c92b8726313921c394de19a

See more details on using hashes here.

File details

Details for the file maite_datasets-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: maite_datasets-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 107.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for maite_datasets-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 b84132be6213586dd1bae3680d9781b1f9f59ca3431c0980fa163e40b0d8dcd0
MD5 5e5416874dbc411d32c8fb485c348b1b
BLAKE2b-256 2de4f778df572b759c86676a0cfb16fe99849e5c01b744e3336338ee6a58a451

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page