Skip to main content

A collection of regression datasets with PyTorch-like dataset classes.

Project description

🎛️ Regression Datasets

📋 Table of Contents
  1. Installation
  2. Usage
  3. Datasets
  4. License
  5. Contact
  6. Acknowledgments

This repository offers a diverse collection of regression datasets across vision, audio and text domains. It provides dataset classes that follow the PyTorch Datasets structure, allowing users to automatically download and load these datasets with ease. All datasets come with a permissive license, permitting their use for research purposes.

1. Installation

To install the regsets package, you can use pip:

python -m pip install regsets

Alternatively, you can download a specific dataset file (e.g., utkface.py) and include it in your project to load the dataset locally.

2. Usage

Below are examples of how to use the regsets package for loading datasets.

📸 Vision Datasets

from regsets.vision import UTKFace

utkface_trainset = UTKFace(root="./data", split="train", download=True)

for image, label in utkface_trainset:
    ...

🎧 Audio Datasets

from regsets.audio import VCC2018

vcc2018_trainset = VCC2018(root="./data", split="train", download=True)

for audio, sample_rate, label in vcc2018_trainset:
    ...

📝 Text Datasets

from regsets.text import Amazon_Review

amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)

for texts, label in amazon_review_trainset:
    (ori, aug_0, aug_1) = texts
    ...

(back to top)

3. Datasets

For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.

📸 Vision Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
UTKFace 18,964 - 4,741 [1, 116]

🎧 Audio Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
BVCC 4,974 1,066 1,066 [1, 5]
VCC2018 16,464 - 4,116 [1, 5]

📝 Text Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
Amazon Review 250,000 25,000 650,000 [0, 4]
Yelp Review 250,000 25,000 50,000 [0, 4]

(back to top)

4. License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

5. Contact

(back to top)

6. Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regsets-0.1.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

regsets-0.1.2-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file regsets-0.1.2.tar.gz.

File metadata

  • Download URL: regsets-0.1.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.17

File hashes

Hashes for regsets-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f24a3506162209c86703c2d33341d3e884dba9ce38867177106902739db1afa5
MD5 e870c0b48750bdd10068759f3ed93aa3
BLAKE2b-256 66c21ed64dd95eb0e14a71f099a6c9bc769ad239269bfa31508f79a0018c4a2b

See more details on using hashes here.

File details

Details for the file regsets-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: regsets-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.17

File hashes

Hashes for regsets-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ec321105867b2e62df65776ede55a051120fff47724e55919b6c4854bc1b2616
MD5 b9136e2150903ed1982aa95ec5a7917d
BLAKE2b-256 7f3aeca2ad7a08d5f2dff1001d290eb20928cf047f6a6f87bcca7f06036386f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page