Utility tools for SIMPLE vision dataset format.
Project description
simpledataset
Utility tools for simple vision image dataset format.
Features
- See the summary of a dataset
- Convert from/to various dataset formats.
- Web UI to look into a dataset
- CUI tools to split and concat datasets.
- CUI tools to modify labels.
Install
pip install simpledataset
Usage
# Show summary
dataset_summary <input_dataset>
# For Classification dataset, extract only the images that have the specified labels.
# For Detection dataset, extract only the boxes that have the specified labels.
dataset_filter <input_dataset> <output_dataset> [--include_class <class_id> [<class_id> ...]] [--exclude_class <class_id> [<class_id> ...]]
# Update class labels
dataset_map <input_dataset> <output_dataset> --map <src_class_id> <dst_class_id> [--map <src_class_id> <dst_class_id> [--map...]]
dataset_split # NYI
# Concatenate multiple datasets into one dataset.
dataset_concat <input_txt_filepath> <input_txt_filepath2> [<input_txt_filepath>, ...] <output_txt_filepath>
dataset_shuffle # NYI
dataset_sample # NYI
# Re-package images and labels into new zip files.
dataset_pack <input_txt_filepath> <output_filepath> [--images_directory=<images_directory>] [--keep_empty_images]
# Remove labels with no actual data.
dataset_defrag <input_txt_filepath> <output_txt_filepath>
# Draw bounding boxes into images.
dataset_draw <input_txt_filepath> <output_dir>
# Convert from/to other dataset types.
dataset_convert_from {coco|openimages_od|openimages_vr} ... <output_filepath>
dataset_convert_to <input_dataset> {coco|image_classification|object_detection} <output_filepath>
Examples
Please see CONVERT.md for the dataset conversion examples.
Change class ids
For example, if you would like to change MNIST to odd or even classification dataset, you can use dataset_map command. In this example, we use class_id=0 for even numbers, and class_id=1 for odd numbers.
dataset_map mnist.txt new_dataset.txt --map 2 0 --map 3 1 --map 4 0 --map 5 1 --map 6 0 --map 7 1 --map 8 0 --map 9 1
Concatenate two datasets
For example, if you had 2 datasets (mnist_subset and mnist_subset2) and wanted to combine them, you can use dataset_concat command.
dataset_concat mnist_subset/images.txt mnist_subset2/images.txt new_combined.txt
# new_combined.txt has 20 classes at this point. Let's merge them into 10 classes.
dataset_map new_combined.txt new_mapped_10.txt --map 10 0 --map 11 1 --map 12 2 --map 13 3 --map 14 4 --map 15 5 --map 16 6 --map 17 7 --map 18 8 --map 19 9
SIMPLE Dataset format
Currently there are 2 dataset formats, Image Classification and Object Detection. Both datasets have a single txt file, image files and an optional list of label names (labels.txt). In addition to that, Object Detection datasets has label files that contains bbox info.
Image Classification
The main txt format is:
<file> ::= <txt_line> ('\n' <txt_line>)*
<txt_line> ::= <image_filepath> ' ' <labels>
<image_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>
<labels> ::= <class_id> (',' <class_id>)*
Here is an example txt file.
train_images.zip@0.jpg 0
train_images2.zip@1.jpg 1
image.png 0,1
image2.bmp 0,1,2,3
Object Detection
The main txt format is:
<file> ::= <txt_line> ('\n' <txt_line>)*
<txt_line> ::= <image_filepath> ' ' <label_filepath>
<image_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>
<label_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>
The format of a label file is:
<file> ::= <label_line> ('\n' <label_line>)*
<label_line> ::= <class_id> ' ' <bbox_x_min> ' ' <bbox_y_min> ' ' <bbox_x_max> ' ' <bbox_y_max>
<class_id> ::= <int>
<bbox_x_min> ::= <int> ; 0 <= <bbox_x_min> < <bbox_x_max> <= <image_width>
<bbox_y_min> ::= <int> ; 0 <= <bbox_y_min> < <bbox_y_max> <= <image_height>
<bbox_x_max> ::= <int>
<bbox_y_max> ::= <int>
Visual Relationship
The main txt format is same with Object Detection.
The format of a label file is:
<file> ::= <label_line> ('\n' <label_line>)*
<label_line> ::= <subject_id> ' ' <subject_bbox_x_min> ' ' <subject_bbox_y_min> ' ' <subject_bbox_x_max> ' ' <subject_bbox_y_max> ' ' <object_id> ' ' <object_bbox_x_min> ' ' <object_bbox_y_min> ' ' <object_bbox_x_max> ' ' <object_bbox_y_max> ' ' <predicate_id>
<subject_id> ::= <int>
<object_id> ::= <int>
<predicate_id> ::= <int>
Usage for remote datasets
NYI. This tool allows you to use datasets on Azure Blob Storage. You can update a dataset on the storage efficiently.
# To download a dataset from Azure Blob Storage.
dataset_download <url_with_container_sas> <output_dir>
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file simpledataset-0.2.2.tar.gz
.
File metadata
- Download URL: simpledataset-0.2.2.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a227f038f3e3b95bd31a795e46ee98ef80046213ccb9d8580508093885abd194 |
|
MD5 | 79ccf5c68d1994375029dc351a4ad4c0 |
|
BLAKE2b-256 | 97761678c9ba0846fbeeebaf3d7bc13b0c8f4a5ccd6c13ea3969fafdd0568931 |
File details
Details for the file simpledataset-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: simpledataset-0.2.2-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc3a537b0534a52f67c1042b7e9030abdbd51df143ff48f1942868a4287234cc |
|
MD5 | cc179b2f78f195eb5be909c4e6490d6e |
|
BLAKE2b-256 | 89ddc16960a5525cf1e6276dfb25f30081a80d6d48b9ef62f893663430ddbaa5 |