Clean datasets for computer vision.
Project description
Open, Clean Datasets for Computer Vision
🔥 We use
fastdup - a free tool to clean all datasets shared in this repo.
Explore the docs »
Report Issues
·
Read Blog
·
Get In Touch
·
About Us
What?
This repo shares clean version of publicly available computer vision datasets.
Why?
Even with the success of generative models, data quality remains an issue that's mainly overlooked. Training models will erroneours data impacts model accuracy, incurs costs in time, storage and computational resources.
How?
In this repo we share clean version of various computer vision datasets.
The datasets are cleaned using a free tool we released - fastdup.
We hope this effort will also help the community train better models and mitigate various model biases.
The cleaned image dataset should be free from most if not all of the following issues:
- Duplicates.
- Broken images.
- Outliers.
- Low information images (dark/bright/blurry images).
Datasets
Here are some of the datasets we are currently working on.
Dataset | Issues |
---|---|
Food-101 |
|
Oxford Pets |
|
Imagenette |
|
Laion 1B |
|
Imagenet-21k |
|
Imagenet-1k |
|
KITTI |
|
DeepFashion |
|
Places365-standard |
|
CelebA-HQ |
|
ADE20K |
|
COCO |
|
Getting Started
Install vl_datasets
package from PyPI.
pip install vl-datasets
Import the clean version of dataset.
from vl_datasets import CleanFood101
Load the dataset into a PyTorch DataLoader
.
train_dataset = CleanFood101('./', split='train', exclude_csv='food_101_vl-datasets_analysis.csv', transform=train_transform)
valid_dataset = CleanFood101('./', split='test', exclude_csv='food_101_vl-datasets_analysis.csv', transform=valid_transform)
Now you can use the dataset in a PyTorch training loop. Refer to our sample training notebooks for details.
Sample training notebooks:
Disclaimer
You are bound to the usage license of the original dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license. We provide no warranty or guarantee of accuracy or completeness.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file vl_datasets-0.0.1-py3.10-none-any.whl
.
File metadata
- Download URL: vl_datasets-0.0.1-py3.10-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec075db92d78513e496c9646789f956d6a9b9510e408062aa103d6613852a5db |
|
MD5 | 5e12ec6044ec54864d20fcc2a0abb28c |
|
BLAKE2b-256 | 4d2d768e32c86e5d60cb4b17802ca056ce57ca752b8166ff36d29b5df1621b5f |
File details
Details for the file vl_datasets-0.0.1-py3.9-none-any.whl
.
File metadata
- Download URL: vl_datasets-0.0.1-py3.9-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92e51e0e7f7da9462aa1649bf47f1e3a302a3ed8be703569fd659b303e43c0f5 |
|
MD5 | c4814aecd64638cfa44bcbf9bd4db257 |
|
BLAKE2b-256 | ee7cefc9809673c0edcbe59324b144be26dd09b9d89c7bb110c7ecced480444a |