Split folders with files (e.g. images) into training, validation and test (dataset) folders.

These details have not been verified by PyPI

Project links

Project description

`split-folders`

Split folders with files (e.g. images) into train, validation and test (dataset) folders.

The input folder should have the following format:

input/
    class1/
        img1.jpg
        img2.jpg
        ...
    class2/
        imgWhatever.jpg
        ...
    ...

In order to give you this:

output/
    train/
        class1/
            img1.jpg
            ...
        class2/
            imga.jpg
            ...
    val/
        class1/
            img2.jpg
            ...
        class2/
            imgb.jpg
            ...
    test/
        class1/
            img3.jpg
            ...
        class2/
            imgc.jpg
            ...

This should get you started to do some serious deep learning on your data. Read here why it's a good idea to split your data intro three different sets.

Split files into a training set and a validation set (and optionally a test set).
Works on any file types.
The files get shuffled.
A seed makes splits reproducible.
Allows randomized oversampling for imbalanced datasets.
Optionally group files by prefix.
(Should) work on all operating systems.

Install

This package is Python only and there are no external dependencies.

pip install split-folders

Optionally, you may install tqdm to get get a progress bar when moving files.

pip install split-folders[full]

Usage

You can use split-folders as Python module or as a Command Line Interface (CLI).

If your datasets is balanced (each class has the same number of samples), choose ratio otherwise fixed. NB: oversampling is turned off by default. Oversampling is only applied to the train folder since having duplicates in val or test would be considered cheating.

Module

import splitfolders

# Split with a ratio.
# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
splitfolders.ratio("input_folder", output="output",
    seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values

# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
splitfolders.fixed("input_folder", output="output",
    seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values

Occasionally, you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)). splitfolders lets you split files into equally-sized groups based on their prefix. Set group_prefix to the length of the group (e.g. 2). But now all files should be part of groups.

Set move=True if you want to move the files instead of copying.

CLI

Usage:
    splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
Options:
    --output        path to the output folder. defaults to `output`. Get created if non-existent.
    --ratio         the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
    --fixed         set the absolute number of items per validation/test set. The remaining items constitute
                    the training set. e.g. for train/val/test `100 100` or for train/val `100`.
                    Set 3 values, e.g. `300 100 100`, to limit the number of training values.
    --seed          set seed value for shuffling the items. defaults to 1337.
    --oversample    enable oversampling of imbalanced datasets, works only with --fixed.
    --group_prefix  split files into equally-sized groups based on their prefix
    --move          move the files instead of copying
Example:
    splitfolders --ratio .8 .1 .1 -- folder_with_images

Because of some Python quirks you have to prepend -- afer using --ratio.

Instead of the command splitfolders you can also use split_folders or split-folders.

Development

Install and use poetry.

Contributing

If you have a question, found a bug or want to propose a new feature, have a look at the issues page.

Pull requests are especially welcomed when they fix bugs or improve the code quality.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.1

Feb 3, 2022

0.5.0

Jan 30, 2022

0.4.3

Nov 1, 2020

0.4.2

Aug 5, 2020

0.4.1

Aug 5, 2020

0.4.0

Aug 4, 2020

0.3.1

Jul 30, 2019

0.2.3

Jul 5, 2019

0.2.2

May 12, 2019

0.2.1

Nov 9, 2018

0.2.0

Oct 18, 2018

0.1.0

Oct 4, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

split_folders-0.5.1.tar.gz (7.9 kB view details)

Uploaded Feb 3, 2022 Source

Built Distribution

split_folders-0.5.1-py3-none-any.whl (8.4 kB view details)

Uploaded Feb 3, 2022 Python 3

File details

Details for the file split_folders-0.5.1.tar.gz.

File metadata

Download URL: split_folders-0.5.1.tar.gz
Upload date: Feb 3, 2022
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.12 CPython/3.10.1 Darwin/21.3.0

File hashes

Hashes for split_folders-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`7127a226b90e00fa86cda4451fe015c6f3755bc3d627064adb9c5209fc8280f6`
MD5	`de7808804bdfc0eb5e8a2fb9371ed97f`
BLAKE2b-256	`a74c32d2d49b82ea5baf0ff1a55de88c7fb8a0bf2aab02763c8501b2a51bf55f`

See more details on using hashes here.

File details

Details for the file split_folders-0.5.1-py3-none-any.whl.

File metadata

Download URL: split_folders-0.5.1-py3-none-any.whl
Upload date: Feb 3, 2022
Size: 8.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.12 CPython/3.10.1 Darwin/21.3.0

File hashes

Hashes for split_folders-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb010e00f34d247b8e8bbfd6cfe527f871361d8524ed54734924e7efd261801f`
MD5	`5084233ef742b710d616cdb882095f9f`
BLAKE2b-256	`b6d5307d63c03356bad6e141d8718d3f4116f51bd9c4b09e2614ffcee1f3c6fd`

See more details on using hashes here.

split-folders 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`split-folders`

Install

Usage

Module

CLI

Development

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes