A collection of regression datasets, featuring PyTorch-like dataset classes.
Project description
🎛️ Regression Datasets
📋 Table of Contents
This repository contains a collection of various regression datasets. I have unified their data format to make them easier to read and process. Additionally, I have included code that follows the PyTorch Datasets structure, which allows users to automatically download and load the datasets. All datasets come with a permissive license, permitting their use for research purposes.
1. Usage
This repository contains datasets for Vision, Audio, and Text. The corresponding files are located in the following folders:
Each folder contains multiple [dataset].py
files. You can directly import these files to automatically download and load the datasets. Alternatively, you can copy the desired [dataset].py
file into your project to load the dataset locally.
📸 Example Usage of Vision Datasets
from vision.utkface import UTKFace
utkface_trainset = UTKFace(root="./data", split="train", download=True)
for image, label in utkface_trainset:
...
🎧 Example Usage of Audio Datasets
from audio.vcc2018 import VCC2018
vcc2018_trainset = VCC2018(root="./data", split="train", download=True)
for audio, sample_rate, label in vcc2018_trainset:
...
📝 Example Usage of Text Datasets
from text.amazon_review import Amazon_Review
amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)
for texts, label in amazon_review_trainset:
(ori, aug_0, aug_1) = texts
...
2. Datasets
For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.
📸 Vision
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
UTKFace | 18,964 | - | 4,741 | [1, 116] |
🎧 Audio
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
BVCC | 4,974 | 1,066 | 1,066 | [1, 5] |
VCC2018 | 16,464 | - | 4,116 | [1, 5] |
📝 Text
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
Amazon Review | 250,000 | 25,000 | 650,000 | [0, 4] |
Yelp Review | 250,000 | 25,000 | 50,000 | [0, 4] |
3. License
Distributed under the MIT License. See LICENSE for more information.
4. Contact
- Pin-Yen Huang (pyhuang97@gmail.com)
5. Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.