Skip to main content

🗂 Split folders with files (e.g. images) into training, validation and test (dataset) folders.

Project description

Split Folders Build Status PyPI PyPI - Python Version

Split folders with files (e.g. images) into train, validation and test (dataset) folders.

The input folder shoud have the following format:

input/
    class1/
        img1.jpg
        img2.jpg
        ...
    class2/
        imgWhatever.jpg
        ...
    ...

In order to give you this:

output/
    train/
        class1/
            img1.jpg
            ...
        class2/
            imga.jpg
            ...
    val/
        class1/
            img2.jpg
            ...
        class2/
            imgb.jpg
            ...
    test/
        class1/
            img3.jpg
            ...
        class2/
            imgc.jpg
            ...

This should get you started to do some serious deep learning on your data. Read here why it's a good idea to split your data intro three different sets.

  • You may only split into a training and validation set.
  • The data gets split before it gets shuffled.
  • A seed lets you reproduce the splits.
  • Works on any file types.
  • (Should) work on all operating systems.

Install

pip install split-folders

Usage

You you can use split_folders as Python module or as a Command Line Interface (CLI).

Module

import split_folders

# split with a ratio. To only split into training and validation set, set a tuple, e.g. (.8, .2)
split_folders.ratio('input_folder', output="output", seed=1337, ratio=(.8, .1, .1)) # default values

# split val/test with a fixed number of items e.g. 100 for each set. To only split into training and validation set, a single number.
split_folders.fixed('input_folder', output="output", seed=1337, fixed=(100, 100)) # default values

CLI

Usage:
    split_folders folder_with_images [--output] [--ratio] [--fixed] [--seed]
Options:
    --output    path to the output folder. defaults to `output`. Get created if non-existent.
    --ratio     the ratio to split. e.g. for train/val/test `.8 .1 .1` or for train/val `.8 .2`
    --fixed     set the absolute number of items per validation/test set. The remaining items constitute the training set.
                e.g. for train/val/test `100 100` or for train/val `100`
    --seed      set seed value for shuffling the items. defaults to 1337.
Example:
    split_folders imgs --ratio .8 .1 .1

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

split_folders-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

split_folders-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file split_folders-0.1.0.tar.gz.

File metadata

  • Download URL: split_folders-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for split_folders-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ec88fd6a4f965d6739a72951f83f85abb24c92926a7148b72a465d69bce3095
MD5 fba66cfb4cc55cb01b33197279c12c95
BLAKE2b-256 96626946b29e5c99d51a9370a3c75a5716f8626676685129598a4304f62e5ee5

See more details on using hashes here.

File details

Details for the file split_folders-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: split_folders-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for split_folders-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ef097d4cb15b68d14b85759eaa22e6836d304bc7c5e6aa51d501e4e4835142f
MD5 e83a834cbc1c886f4bc4010ab5696a9d
BLAKE2b-256 e13dc402a2dc38010e34d51227ba5e227e8e73a41cd17ee7b424dd4ae947d7cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page