Skip to main content

A collection of regression datasets, featuring PyTorch-like dataset classes.

Project description

🎛️ Regression Datasets

📋 Table of Contents
  1. Usage
  2. Datasets
  3. License
  4. Contact
  5. Acknowledgments

This repository contains a collection of various regression datasets. I have unified their data format to make them easier to read and process. Additionally, I have included code that follows the PyTorch Datasets structure, which allows users to automatically download and load the datasets. All datasets come with a permissive license, permitting their use for research purposes.

1. Usage

This repository contains datasets for Vision, Audio, and Text. The corresponding files are located in the following folders:

Each folder contains multiple [dataset].py files. You can directly import these files to automatically download and load the datasets. Alternatively, you can copy the desired [dataset].py file into your project to load the dataset locally.

📸 Example Usage of Vision Datasets

from vision.utkface import UTKFace

utkface_trainset = UTKFace(root="./data", split="train", download=True)

for image, label in utkface_trainset:
    ...

🎧 Example Usage of Audio Datasets

from audio.vcc2018 import VCC2018

vcc2018_trainset = VCC2018(root="./data", split="train", download=True)

for audio, sample_rate, label in vcc2018_trainset:
    ...

📝 Example Usage of Text Datasets

from text.amazon_review import Amazon_Review

amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)

for texts, label in amazon_review_trainset:
    (ori, aug_0, aug_1) = texts
    ...

(back to top)

2. Datasets

For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.

📸 Vision

Dataset # Training Data # Dev Data # Test Data Target Range
UTKFace 18,964 - 4,741 [1, 116]

🎧 Audio

Dataset # Training Data # Dev Data # Test Data Target Range
BVCC 4,974 1,066 1,066 [1, 5]
VCC2018 16,464 - 4,116 [1, 5]

📝 Text

Dataset # Training Data # Dev Data # Test Data Target Range
Amazon Review 250,000 25,000 650,000 [0, 4]
Yelp Review 250,000 25,000 50,000 [0, 4]

(back to top)

3. License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

4. Contact

(back to top)

5. Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regsets-0.1.1.tar.gz (4.1 kB view hashes)

Uploaded Source

Built Distribution

regsets-0.1.1-py3-none-any.whl (4.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page