Lightweight utilities for organizing image datasets: split, merge, label-based sorting, and directory structure creation
Project description
folderops
A lightweight Python package for organizing image datasets in machine learning workflows.
It focuses on the most common and repetitive tasks: splitting datasets, merging folders, structuring directories, and organizing data from labels. Everything is designed for direct use inside notebooks and research pipelines with minimal friction.
Why folderops
If you’ve worked with vision datasets, you’ve probably rewritten the same scripts over and over:
- splitting train/val/test
- merging datasets from different sources
- reorganizing files from CSV labels
- creating directory structures manually
This package removes that overhead and gives you reliable, reusable utilities.
Features
- Split datasets into train / validation / test sets
- Merge files from nested directories into a single folder
- Organize images into class folders using CSV labels
- Create directory structures from lists or nested dictionaries
- Supports common image formats:
.jpg,.jpeg,.png,.bmp,.gif,.tif,.tiff,.webp - Works consistently in terminal, VS Code, and Jupyter notebooks
Installation
pip install folderops
For development:
pip install -e .
Quick Start
from folderops import split_dataset, merge_folders, organize_by_labels, create_structure
split_dataset(
source="images",
output="dataset",
train_ratio=0.7,
val_ratio=0.15,
test_ratio=0.15,
seed=42,
)
merge_folders(
source="dataset/images",
output="merged_images",
)
organize_by_labels(
image_dir="images",
label_file="labels.csv",
output="organized_dataset",
)
structure = {
"dataset": {
"train": {},
"val": {},
"test": {}
}
}
create_structure(structure)
API Reference
split_dataset
Split a dataset organized by class folders into train, validation, and test sets.
Expected input structure
source/
class1/
img1.jpg
img2.jpg
class2/
img3.jpg
Output structure
output/
train/
class1/
class2/
val/
class1/
class2/
test/
class1/
class2/
Usage
split_dataset(
source="images",
output="dataset",
train_ratio=0.7,
val_ratio=0.15,
test_ratio=0.15,
seed=42,
mode="copy",
extensions=(".jpg", ".png"),
)
Key behavior
- Splits per class, not globally
- Shuffles files before splitting
- Supports deterministic splits via seed
- Supports both
copyandmove - Validates that ratios sum to 1.0
- Displays progress cleanly in both terminal and notebooks
merge_folders
Merge all files from a directory (including subfolders) into a single folder.
Example
source/
cats/
a.jpg
dogs/
a.jpg
b.jpg
Result
merged/
a.jpg
a_1.jpg
b.jpg
Usage
merge_folders(
source="source",
output="merged",
mode="copy",
)
Key behavior
- Recursively scans all subfolders
- Prevents overwriting using automatic renaming
- Preserves all files
- Supports extension filtering
organize_by_labels
Organize images into class folders using a CSV file.
CSV format
path,class
img1.jpg,cats
img2.jpg,dogs
Usage
organize_by_labels(
image_dir="images",
label_file="labels.csv",
output="organized",
mode="copy",
)
Result
organized/
cats/
img1.jpg
dogs/
img2.jpg
Key behavior
- Validates every file exists before transfer
- Raises clear errors for missing or invalid entries
- Supports custom delimiters
- Optional strict extension filtering
create_structure
Create directory structures from a list or nested dictionary.
List-based usage
paths = ["train/cats", "train/dogs", "val/cats"]
create_structure(paths, root="dataset")
Dictionary-based usage
structure = {
"dataset": {
"train": {},
"val": {},
"test": {}
}
}
create_structure(structure)
Result
dataset/
train/
val/
test/
Key behavior
- Accepts both flat lists and nested dictionaries
- Automatically creates missing directories
- Safe to run multiple times
Project Structure
folderops/
├── folderops/
│ ├── __init__.py
│ ├── merger.py
│ ├── organizer.py
│ ├── splitter.py
│ ├── structure.py
│ └── utils.py
├── LICENSE
├── pyproject.toml
└── README.md
Design Principles
- Minimal dependencies
- Explicit, readable APIs
- Safe file operations
- Notebook-friendly behavior
- Reproducible dataset handling
Build and Publish
python -m build
twine upload dist/*
Requirements
- Python 3.8+
- tqdm
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folderops-1.0.0.tar.gz.
File metadata
- Download URL: folderops-1.0.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed225be9c023d5f4e005ff6c5a37b1f9329db5b0397fdb1b33c6f5466a6ab7d8
|
|
| MD5 |
8b043d3dddd9a78504daf42ba43237ea
|
|
| BLAKE2b-256 |
f8097bead5440fa06a4ad6c959818d831f95ed04e6d36b9646690a97515e4162
|
File details
Details for the file folderops-1.0.0-py3-none-any.whl.
File metadata
- Download URL: folderops-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae2ba16b21b7289711041ac031015808279e4fffb67be611c2aba729839b305f
|
|
| MD5 |
b269e787f2c390b37637916e616cd3b4
|
|
| BLAKE2b-256 |
2f8237ad25b8665246bf21946b81cd0e639c8acf1afe54136a1a2d19cb9d88b5
|