Lightweight utilities for organizing image datasets: split, merge, label-based sorting, and directory structure creation

These details have not been verified by PyPI

Project description

folderops

A lightweight Python package for organizing image datasets in machine learning workflows.

It focuses on the most common and repetitive tasks: splitting datasets, merging folders, structuring directories, and organizing data from labels. Everything is designed for direct use inside notebooks and research pipelines with minimal friction.

Why folderops

If you’ve worked with vision datasets, you’ve probably rewritten the same scripts over and over:

splitting train/val/test
merging datasets from different sources
reorganizing files from CSV labels
creating directory structures manually

This package removes that overhead and gives you reliable, reusable utilities.

Features

Split datasets into train / validation / test sets
Merge files from nested directories into a single folder
Organize images into class folders using CSV labels
Create directory structures from lists or nested dictionaries
Supports common image formats:
.jpg, .jpeg, .png, .bmp, .gif, .tif, .tiff, .webp
Works consistently in terminal, VS Code, and Jupyter notebooks

Installation

pip install folderops

For development:

pip install -e .

Quick Start

from folderops import split_dataset, merge_folders, organize_by_labels, create_structure

split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
)

merge_folders(
    source="dataset/images",
    output="merged_images",
)

organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized_dataset",
)

structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}

create_structure(structure)

API Reference

split_dataset

Split a dataset organized by class folders into train, validation, and test sets.

Expected input structure

source/
    class1/
        img1.jpg
        img2.jpg
    class2/
        img3.jpg

Output structure

output/
    train/
        class1/
        class2/
    val/
        class1/
        class2/
    test/
        class1/
        class2/

Usage

split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
    mode="copy",
    extensions=(".jpg", ".png"),
)

Key behavior

Splits per class, not globally
Shuffles files before splitting
Supports deterministic splits via seed
Supports both copy and move
Validates that ratios sum to 1.0
Displays progress cleanly in both terminal and notebooks

merge_folders

Merge all files from a directory (including subfolders) into a single folder.

Example

source/
    cats/
        a.jpg
    dogs/
        a.jpg
        b.jpg

Result

merged/
    a.jpg
    a_1.jpg
    b.jpg

Usage

merge_folders(
    source="source",
    output="merged",
    mode="copy",
)

Key behavior

Recursively scans all subfolders
Prevents overwriting using automatic renaming
Preserves all files
Supports extension filtering

organize_by_labels

Organize images into class folders using a CSV file.

CSV format

path,class
img1.jpg,cats
img2.jpg,dogs

Usage

organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized",
    mode="copy",
)

Result

organized/
    cats/
        img1.jpg
    dogs/
        img2.jpg

Key behavior

Validates every file exists before transfer
Raises clear errors for missing or invalid entries
Supports custom delimiters
Optional strict extension filtering

create_structure

Create directory structures from a list or nested dictionary.

List-based usage

paths = ["train/cats", "train/dogs", "val/cats"]
create_structure(paths, root="dataset")

Dictionary-based usage

structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}

create_structure(structure)

Result

dataset/
    train/
    val/
    test/

Key behavior

Accepts both flat lists and nested dictionaries
Automatically creates missing directories
Safe to run multiple times

Project Structure

folderops/
├── folderops/
│   ├── __init__.py
│   ├── merger.py
│   ├── organizer.py
│   ├── splitter.py
│   ├── structure.py
│   └── utils.py
├── LICENSE
├── pyproject.toml
└── README.md

Design Principles

Minimal dependencies
Explicit, readable APIs
Safe file operations
Notebook-friendly behavior
Reproducible dataset handling

Build and Publish

python -m build

twine upload dist/*

Requirements

Python 3.8+
tqdm

License

MIT License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Mar 15, 2026

0.1.2

Mar 15, 2026

0.1.1

Mar 15, 2026

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folderops-1.0.0.tar.gz (10.2 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

folderops-1.0.0-py3-none-any.whl (10.6 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file folderops-1.0.0.tar.gz.

File metadata

Download URL: folderops-1.0.0.tar.gz
Upload date: Mar 15, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.10

File hashes

Hashes for folderops-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ed225be9c023d5f4e005ff6c5a37b1f9329db5b0397fdb1b33c6f5466a6ab7d8`
MD5	`8b043d3dddd9a78504daf42ba43237ea`
BLAKE2b-256	`f8097bead5440fa06a4ad6c959818d831f95ed04e6d36b9646690a97515e4162`

See more details on using hashes here.

File details

Details for the file folderops-1.0.0-py3-none-any.whl.

File metadata

Download URL: folderops-1.0.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.10

File hashes

Hashes for folderops-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae2ba16b21b7289711041ac031015808279e4fffb67be611c2aba729839b305f`
MD5	`b269e787f2c390b37637916e616cd3b4`
BLAKE2b-256	`2f8237ad25b8665246bf21946b81cd0e639c8acf1afe54136a1a2d19cb9d88b5`

See more details on using hashes here.

folderops 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

folderops

Why folderops

Features

Installation

Quick Start

API Reference

split_dataset

Expected input structure

Output structure

Usage

Key behavior

merge_folders

Example

Result

Usage

Key behavior

organize_by_labels

CSV format

Usage

Result

Key behavior

create_structure

List-based usage

Dictionary-based usage

Result

Key behavior

Project Structure

Design Principles

Build and Publish

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes