Skip to main content

A utility to create a PyTorch DatasetFolder from any .csv or .tsv file with file path and class data.

Project description

make-datasetfolder

A utility to create a PyTorch DatasetFolder from any .csv or .tsv file with file path and class data.

Use Case

In PyTorch, the DataFolder and ImageFolder classes provide a convenient interface for computer vision datasets structured as such:

root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext

root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext

This utility transforms any dataset with a table containing file paths and class labels into this format.

Example

Suppse you have dataset.csv of the form:

sample,class,some_feature,another_feature
img-0001.jpg,0,foo,bar
some/relative/directory/img-0002.jpg,1,foo,bar
...

Running make-dataset-folder -p sample -l class dataset.csv output will create a folder output with the following structure:

output/0/img-0001.jpg
output/1/img-0002.jpg
...

Using the -m flag will move images rather than copy them. This could be useful for large datasets that shouldn't be duplicated on disk.

Usage

usage: make-datasetfolder [-h] [-p PATH_COLUMN] [-l LABEL_COLUMN] [-m] [-f]
                          [-t THREADS]
                          input output

positional arguments:
  input                 Path to input .csv or .tsv
  output                Path to output directory.

optional arguments:
  -h, --help            show this help message and exit
  -p PATH_COLUMN, --path-column PATH_COLUMN
                        Column name or index with file paths (default: 0).
  -l LABEL_COLUMN, --label-column LABEL_COLUMN
                        Column name or index with labels (default: 1).
  -m, --move            Move files instead of copying.
  -f, --force           Overwrite output directory if it already exists.
  -t THREADS, --threads THREADS
                        Number of threads to use (default: number of CPU
                        cores)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

make_datasetfolder-0.0.1.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

make_datasetfolder-0.0.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file make_datasetfolder-0.0.1.tar.gz.

File metadata

  • Download URL: make_datasetfolder-0.0.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for make_datasetfolder-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b317ae9e4d2e8f7642d10312863d6b9188e46e4ddc365e0442a5a83f8a211c48
MD5 29b7ceda4c8b157d4ad2cff27207250e
BLAKE2b-256 c872419c380125b3e64e0b8d2b783ddf91eff2802b1a41ac1c6f0e378d2394c6

See more details on using hashes here.

File details

Details for the file make_datasetfolder-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: make_datasetfolder-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0.post20200127 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for make_datasetfolder-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2757a5be9ce4ebaec59db3c73f245e7f06cde22d293964de0a0d16c15f94ffa5
MD5 e2781904c88e1b16e88ed93719c8115d
BLAKE2b-256 6f06d5e1298de89580c2363c18567d1075d4e92cdc5a9d33ffc2d8a5d81418ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page