Skip to main content

Tools to merge and remap computer vision datasets

Project description

datakit

PyPI

Python package for YOLO-format dataset operations:

  • merge multiple datasets into one
  • merge multiple class names into a target class
  • remap class IDs
  • visualize labeled samples

Install

pip install cv-datakit

CLI Usage

1) Merge datasets

datakit merge /path/ds1 /path/ds2 --out /path/out

2) Merge classes

datakit merge-classes /path/dataset --from Backpack Backpacks --to bag

3) Remap classes

datakit remap /path/dataset --names bag person --map 0:0 1:0 2:1

Remap safety behavior:

  • validates that all mapped target IDs are within 0..len(new_names)-1
  • pre-scans all label files to ensure every class ID has a mapping before writing
  • only writes labels and data.yaml after validation succeeds
  • note: writes are not yet atomic across process interruption (power loss/kill)

4) Visualize samples

datakit visualize --images-dir /path/dataset/val/images --labels-dir /path/dataset/val/labels --n 12 --seed 1

Format selection

datakit --format yolo merge /path/ds1 /path/ds2 --out /path/out

Python API

from datakit import merge_datasets, merge_classes, remap_dataset, plot_random_samples

merge_datasets(["/path/ds1", "/path/ds2"], "/path/out")
merge_classes("/path/dataset", ["Backpack", "Backpacks"], "bag")
remap_dataset("/path/dataset", ["bag", "person"], {0: 0, 1: 0, 2: 1})
plot_random_samples("/path/dataset/val/images", "/path/dataset/val/labels", n=12, seed=1)

Extend to new formats

  1. Implement DatasetFormatHandler in a new module, for example datakit/formats/coco.py.
  2. Register the handler in datakit/formats/__init__.py.
  3. Use the same CLI commands with --format coco.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cv_datakit-0.1.2.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cv_datakit-0.1.2-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file cv_datakit-0.1.2.tar.gz.

File metadata

  • Download URL: cv_datakit-0.1.2.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for cv_datakit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5d43ffd32ed0881e993b2f406fa1a78b09a6c7ecc674d9b9ffbc6f4cb245ad2d
MD5 664d121c6af62c3f129bebbad94e03ff
BLAKE2b-256 07b74b200bf181b269060e7908d352066c7b5ccf7f3d11604bd4197a905fc3ae

See more details on using hashes here.

File details

Details for the file cv_datakit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cv_datakit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for cv_datakit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b939c8f06f97458574da520852a6f7ca71e560ea35ad108eaa39b38f2a63c38
MD5 5bd5f25aae1e56635dbc1e06605cd7c3
BLAKE2b-256 399e69099748e460aefc263bd91243ed6eed4009e9d04732cc4e3b4ea713b21d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page