Image Classification Dataset Generator
Project description
ICGen
Installation
pip install icgen
for a development installation see CONTRIBUTING.md
Usage
Sampling Datasets
import icgen
dataset_generator = icgen.ICDatasetGenerator(
data_path="datasets",
min_resolution=16,
max_resolution=512,
max_log_res_deviation=1, # Sample only 1 log resolution from the native one
min_classes=2,
max_classes=100,
min_examples_per_class=20,
max_examples_per_class=100_000,
)
dev_data, test_data, dataset_info = dataset_generator.get_dataset(
dataset="cifar10", augment=True, download=True
)
The augment parameter controls whether the original dataset is modified.
Options only affect sampling with augment=True and the min max ranges do not filter datasets.
The data is left at the original resolution, so it can be resized under user control. This is necessary to for example avoid resizing twice which can hurt performance.
You can also sample from a list of datasets
dataset_generator.get_dataset(datasets=["cifar100", "emnist/balanced"], download=True)
We provide some lists of available datasets
import icgen
icgen.DATASETS_TRAIN
icgen.DATASETS_VAL
icgen.DATASETS_TEST
icgen.DATASETS
or on the commandline you can get the names with
python -m icgen.dataset_names
Downloading Datasets Before Execution
To download datasets ahead of time you can run
python -m icgen.download --data_path DATA_PATH --datasets D1 D2 D3
or directly download a complete group
python -m icgen.download --data_path DATA_PATH --dataset_group GROUP # all, train, dev, test
Alternatively, you can also use the download=True flag of the dataset_generator.get_dataset function.
Reconstructing and Distributing Tasks
In distributed applications it may be necessary to sample datasets on one machine and then use them on another one. Conversely, for reproducibility it may be necessary to store the exact dataset which was used. For these cases icgen uses a dataset identifier which uniquely identifies datasets.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icgen-0.3.0.tar.gz.
File metadata
- Download URL: icgen-0.3.0.tar.gz
- Upload date:
- Size: 25.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.8.3 Linux/5.4.44-1-MANJARO
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1980fc0c358e387889a2acdb87142b2c952d7798ebdd24af24a354fb68e95714
|
|
| MD5 |
aa2d4a3f6b4c3524dc3e31dc73a9c311
|
|
| BLAKE2b-256 |
4f9af6ae40badf859dc7a66cf070ecc5232cb2bb19d5cb248c7ff7ff8df605fa
|
File details
Details for the file icgen-0.3.0-py3-none-any.whl.
File metadata
- Download URL: icgen-0.3.0-py3-none-any.whl
- Upload date:
- Size: 40.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.8.3 Linux/5.4.44-1-MANJARO
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07929a5f86894fbc1e3b66fe3c4f1ef84dd45629680fdff4ee37a9a851353fa3
|
|
| MD5 |
71df5a090abf073c8ce1dfc77630cd1f
|
|
| BLAKE2b-256 |
5ac7e2fc11999617d1419eb7dc2057fed30a2911159ead60ea52564dd9d31225
|