Skip to main content

Image Classification Dataset Generator

Project description

ICGen

Installation

pip install icgen

for a development installation see CONTRIBUTING.md

Usage

Sampling Datasets

import icgen
dataset_generator = icgen.ICDatasetGenerator(
  data_path="datasets",
  min_resolution=16,
  max_resolution=512,
  max_log_res_deviation=1,  # Sample only 1 log resolution from the native one
  min_classes=2,
  max_classes=100,
  min_examples_per_class=20,
  max_examples_per_class=100_000,
)
dev_data, test_data, dataset_info = dataset_generator.get_dataset(
    dataset="cifar10", augment=True, download=True
)

The augment parameter controls whether the original dataset is modified.

Options only affect sampling with augment=True and the min max ranges do not filter datasets.

The data is left at the original resolution, so it can be resized under user control. This is necessary to for example avoid resizing twice which can hurt performance.

You can also sample from a list of datasets

dataset_generator.get_dataset(datasets=["cifar100", "emnist/balanced"], download=True)

We provide some lists of available datasets

import icgen
icgen.DATASETS_TRAIN
icgen.DATASETS_VAL
icgen.DATASETS_TEST
icgen.DATASETS

or on the commandline you can get the names with

python -m icgen.dataset_names

Downloading Datasets Before Execution

To download datasets ahead of time you can run

python -m icgen.download --data_path DATA_PATH --datasets D1 D2 D3

or directly download a complete group

python -m icgen.download --data_path DATA_PATH --dataset_group GROUP  # all, train, dev, test

Alternatively, you can also use the download=True flag of the dataset_generator.get_dataset function.

Reconstructing and Distributing Tasks

In distributed applications it may be necessary to sample datasets on one machine and then use them on another one. Conversely, for reproducibility it may be necessary to store the exact dataset which was used. For these cases icgen uses a dataset identifier which uniquely identifies datasets.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icgen-0.3.0.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

icgen-0.3.0-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file icgen-0.3.0.tar.gz.

File metadata

  • Download URL: icgen-0.3.0.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.3 Linux/5.4.44-1-MANJARO

File hashes

Hashes for icgen-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1980fc0c358e387889a2acdb87142b2c952d7798ebdd24af24a354fb68e95714
MD5 aa2d4a3f6b4c3524dc3e31dc73a9c311
BLAKE2b-256 4f9af6ae40badf859dc7a66cf070ecc5232cb2bb19d5cb248c7ff7ff8df605fa

See more details on using hashes here.

File details

Details for the file icgen-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: icgen-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 40.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.3 Linux/5.4.44-1-MANJARO

File hashes

Hashes for icgen-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07929a5f86894fbc1e3b66fe3c4f1ef84dd45629680fdff4ee37a9a851353fa3
MD5 71df5a090abf073c8ce1dfc77630cd1f
BLAKE2b-256 5ac7e2fc11999617d1419eb7dc2057fed30a2911159ead60ea52564dd9d31225

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page