Skip to main content

mini-imagenet dataset transformed to fit classification task or keep the format for meta-learning tasks.

Project description

mini-ImageNet Logo mini-ImageNet Logo

The project Machine Learning CLassiFication (MLclf)

License: MIT Python PyPI Downloads

The purpose of this project is:

  1. transform the mini-imagenet dataset which is initially created for the few-shot learning to the format that fit the classical classification task. You can also use this package to download and obtain the raw data of the mini-imagenet dataset (for few-shot learning tasks).
  1. transform the tiny-imagenet dataset to the format that fit the classical classification task, which can be more easily used (being able to directly input to the Pytorch dataloader) compared to the original raw format.

The original dataset of mini-imagenet includes totally 100 classes, but due to its intention to meta-learning or few-shot learning, the train/validation/test dataset contains different classes. They have respectively 64/16/20 classes.

The original dataset of tiny-imagenet includes totally 200 classes, the train/validation/test dataset contains all classes. They have respectively 100000/10000/10000 images. For example, the training dataset has 500 images for each class.

In order to make the mini/tiny-imagenet dataset fit the format requirement for the classical classification task. MLclf made a proper transformation (recombination and splitting) of the original mini/tiny-imagenet dataset.

The transformed dataset of mini-imagenet is divided into train, validation and test dataset, each dataset of which includes 100 classes. Each image has the size 84x84 pixels with 3 channels.

The transformed dataset of tiny-imagenet is divided into train, validation and test dataset, each dataset of which includes 200 classes. Each image has the size 64x64 pixels with 3 channels.

Notice: The provider of tiny-imagenet dataset does not public the labels of testing dataset, so there is no labels for the original raw testing dataset.

The MLclf package can be found at: https://github.com/tiger2017/MLclf or at: https://pypi.org/project/MLclf/

Welcome to create an issue to the repository of MLclf on GitHub, and I will add more datasets loading functions based on the issues.

The mini-imagenet source data can be also accessed from: https://deepai.org/dataset/imagenet (there is no need to manually download it if you use MLclf).

Summary

Requirements

  • Python 3.x
  • numpy
  • torchvision

Installation

How to install MLclf package:

pip install MLclf

Usage

How to use this package for mini-imagenet:

from MLclf import MLclf
import torch
import torchvision.transforms as transforms

# Download the original mini-imagenet data:
MLclf.miniimagenet_download(Download=True) # only need to run this line before you download the mini-imagenet dataset for the first time.

# Transform the original data into the format that fits the task for classification:
# Note: If you want to keep the data format as the same as that for the meta-learning or few-shot learning (original format), just set ratio_train=0.64, ratio_val=0.16, shuffle=False.

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# The argument transform is a optional keyword. You can also set transform = None or simply not set transform, if you do not want the data being standardized and only want a normalization b/t [0,1].
train_dataset, validation_dataset, test_dataset = MLclf.miniimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2, seed_value=None, shuffle=True, transform=transform, save_clf_data=True)

# The dataset can be transformed to dataloader via torch: 

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128, shuffle=True, num_workers=0)


# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations can be obtained after MLclf.miniimagenet_clf_dataset is called, otherwise they will be returned as None instead.)

labels_to_marks = MLclf.labels_to_marks['mini-imagenet']
marks_to_labels = MLclf.marks_to_labels['mini-imagenet']

You can also obtain the raw data of mini-imagenet from the downloaded pkl files:

from MLclf import MLclf

# The raw data of mini-imagenet can be also obtained via the function below:

data_raw_train, data_raw_val, data_raw_test = MLclf.miniimagenet_data_raw()

How to use this package for tiny-imagenet for the traditional classification task (similarly as mini-imagenet):

from MLclf import MLclf
import torch
import torchvision.transforms as transforms

MLclf.tinyimagenet_download(Download=True) # only need to run this line before you download the tiny-imagenet dataset for the first time.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
                                                                                     seed_value=None, shuffle=True,
                                                                                     transform=transform,
                                                                                     save_clf_data=True,
                                                                                     few_shot=False)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=5, shuffle=True, num_workers=0)

# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations can be obtained after MLclf.miniimagenet_clf_dataset is called, otherwise they will be returned as None instead.)

labels_to_marks = MLclf.labels_to_marks['tiny-imagenet']
marks_to_labels = MLclf.marks_to_labels['tiny-imagenet']


data_raw_train, data_raw_val, data_raw_test = MLclf.tinyimagenet_data_raw()

If you want to use tiny-imagenet for the few-shot learning task, just change few_shot=True, for example:

train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
                                                                                     seed_value=None, shuffle=True,
                                                                                     transform=transform,
                                                                                     save_clf_data=True,
                                                                                     few_shot=True)
# only original training dataset is used as the whole dataset of the few-shot learning task, so 200 classes in total,
# and in this few-shot learning task's example, 120 classes as training dataset, 40 classes as validation dataset and 40 classes as testing dataset, with 500 images for each class.

Here is a random joke that'll make you laugh!

Jokes Card

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLclf-0.2.14.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MLclf-0.2.14-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file MLclf-0.2.14.tar.gz.

File metadata

  • Download URL: MLclf-0.2.14.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for MLclf-0.2.14.tar.gz
Algorithm Hash digest
SHA256 e4f5a740b8018a18c9f23c282916eeb9e085c7f8b1d02874f68674db563cbe1e
MD5 4396eb3d6c0e13fd18640214cc369d2f
BLAKE2b-256 e2bbcdf2c6167408e049f0a674292aae2f4e5ccadc85ce4217bab3974ce5415e

See more details on using hashes here.

File details

Details for the file MLclf-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: MLclf-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for MLclf-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 d53e010b7d33aa65017d8f59e4373122f4b9eb1a39a4dd667d3930482834013f
MD5 c4f4942dfc21a7d982453f301637ac05
BLAKE2b-256 e155105f55cc120a88f4314fcbf894d56b328dc6a1c6475db1b521076030162d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page