Tools to merge and remap computer vision datasets
Project description
datakit
Python package for YOLO-format dataset operations:
- merge multiple datasets into one
- merge multiple class names into a target class
- remap class IDs
- visualize labeled samples
Install
pip install cv-datakit
CLI Usage
1) Merge datasets
datakit merge /path/ds1 /path/ds2 --out /path/out
2) Merge classes
datakit merge-classes /path/dataset --from Backpack Backpacks --to bag
3) Remap classes
datakit remap /path/dataset --names bag person --map 0:0 1:0 2:1
Remap safety behavior:
- validates that all mapped target IDs are within
0..len(new_names)-1 - pre-scans all label files to ensure every class ID has a mapping before writing
- only writes labels and
data.yamlafter validation succeeds - note: writes are not yet atomic across process interruption (power loss/kill)
4) Visualize samples
datakit visualize --images-dir /path/dataset/val/images --labels-dir /path/dataset/val/labels --n 12 --seed 1
Format selection
datakit --format yolo merge /path/ds1 /path/ds2 --out /path/out
Python API
from datakit import merge_datasets, merge_classes, remap_dataset, plot_random_samples
merge_datasets(["/path/ds1", "/path/ds2"], "/path/out")
merge_classes("/path/dataset", ["Backpack", "Backpacks"], "bag")
remap_dataset("/path/dataset", ["bag", "person"], {0: 0, 1: 0, 2: 1})
plot_random_samples("/path/dataset/val/images", "/path/dataset/val/labels", n=12, seed=1)
Extend to new formats
- Implement
DatasetFormatHandlerin a new module, for exampledatakit/formats/coco.py. - Register the handler in
datakit/formats/__init__.py. - Use the same CLI commands with
--format coco.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cv_datakit-0.1.2.tar.gz
(15.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cv_datakit-0.1.2.tar.gz.
File metadata
- Download URL: cv_datakit-0.1.2.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d43ffd32ed0881e993b2f406fa1a78b09a6c7ecc674d9b9ffbc6f4cb245ad2d
|
|
| MD5 |
664d121c6af62c3f129bebbad94e03ff
|
|
| BLAKE2b-256 |
07b74b200bf181b269060e7908d352066c7b5ccf7f3d11604bd4197a905fc3ae
|
File details
Details for the file cv_datakit-0.1.2-py3-none-any.whl.
File metadata
- Download URL: cv_datakit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b939c8f06f97458574da520852a6f7ca71e560ea35ad108eaa39b38f2a63c38
|
|
| MD5 |
5bd5f25aae1e56635dbc1e06605cd7c3
|
|
| BLAKE2b-256 |
399e69099748e460aefc263bd91243ed6eed4009e9d04732cc4e3b4ea713b21d
|