A collection of regression datasets with PyTorch-like dataset classes.
Project description
🎛️ Regression Datasets
📋 Table of Contents
This repository offers a diverse collection of regression datasets across vision, audio and text domains. It provides dataset classes that follow the PyTorch Datasets structure, allowing users to automatically download and load these datasets with ease. All datasets come with a permissive license, permitting their use for research purposes.
1. Installation
To install the regsets
package, you can use pip:
python -m pip install regsets
Alternatively, you can download a specific dataset file (e.g., utkface.py
) and include it in your project to load the dataset locally.
2. Usage
Below are examples of how to use the regsets
package for loading datasets.
📸 Vision Datasets
from regsets.vision import UTKFace
utkface_trainset = UTKFace(root="./data", split="train", download=True)
for image, label in utkface_trainset:
...
🎧 Audio Datasets
from regsets.audio import VCC2018
vcc2018_trainset = VCC2018(root="./data", split="train", download=True)
for audio, sample_rate, label in vcc2018_trainset:
...
📝 Text Datasets
from regsets.text import Amazon_Review
amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)
for texts, label in amazon_review_trainset:
(ori, aug_0, aug_1) = texts
...
3. Datasets
For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.
📸 Vision Datasets
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
UTKFace | 18,964 | - | 4,741 | [1, 116] |
🎧 Audio Datasets
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
BVCC | 4,974 | 1,066 | 1,066 | [1, 5] |
VCC2018 | 16,464 | - | 4,116 | [1, 5] |
📝 Text Datasets
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
Amazon Review | 250,000 | 25,000 | 650,000 | [0, 4] |
Yelp Review | 250,000 | 25,000 | 50,000 | [0, 4] |
4. License
Distributed under the MIT License. See LICENSE for more information.
5. Contact
- Pin-Yen Huang (pyhuang97@gmail.com)
6. Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file regsets-0.1.2.tar.gz
.
File metadata
- Download URL: regsets-0.1.2.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f24a3506162209c86703c2d33341d3e884dba9ce38867177106902739db1afa5 |
|
MD5 | e870c0b48750bdd10068759f3ed93aa3 |
|
BLAKE2b-256 | 66c21ed64dd95eb0e14a71f099a6c9bc769ad239269bfa31508f79a0018c4a2b |
File details
Details for the file regsets-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: regsets-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec321105867b2e62df65776ede55a051120fff47724e55919b6c4854bc1b2616 |
|
MD5 | b9136e2150903ed1982aa95ec5a7917d |
|
BLAKE2b-256 | 7f3aeca2ad7a08d5f2dff1001d290eb20928cf047f6a6f87bcca7f06036386f1 |