Skip to main content

Utilities to load and use pytorch datasets stored in Minio S3

Project description

Torch Dataset Utilities

The python library torchdatasetutils produces torch DataLoader classes and utility functions for several imaging datasets. This currently includes sets of images and annotations from CVAT, COCO dataset. "torchdatasetutil" uses an s3 object storage to hold dataset data. This enables training and test to be performed on nodes different from where the dataset is stored with application defined credentials. It uses torch PyTorch worker threads to prefetch data for efficient GPU or CPU training and inference.

"torchdatasetutils" takes as an input the pymlutil.s3 object to access the object storage.

Two json or yaml dictionaries are loaded from the object storage to identify and process the dataset: the dataset description and class dictionary. The the dataset description is unique for each type of dataset. The class dictionary is common to all datasets and describes data transformation and data augmentation.

Library structure

See torchdatasetutil.ipynb for library interface and usage

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchdatasetutil-0.0.26.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distribution

torchdatasetutil-0.0.26-py3-none-any.whl (19.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page