Skip to main content

Dataset Management Framework (Datumaro)

Project description

Dataset Management Framework (Datumaro)

Build status codecov

A framework and CLI tool to build, transform, and analyze datasets.

VOC dataset                                  ---> Annotation tool
     +                                     /
COCO dataset -----> Datumaro ---> dataset ------> Model training
     +                                     \
CVAT annotations                             ---> Publication, statistics etc.

Features

(Back to top)

  • Dataset reading, writing, conversion in any direction.

    Other formats and documentation for them can be found here.

  • Dataset building

    • Merging multiple datasets into one
    • Dataset filtering by a custom criteria:
      • remove polygons of a certain class
      • remove images without annotations of a specific class
      • remove occluded annotations from images
      • keep only vertically-oriented images
      • remove small area bounding boxes from annotations
    • Annotation conversions, for instance:
      • polygons to instance masks and vice-versa
      • apply a custom colormap for mask annotations
      • rename or remove dataset labels
    • Splitting a dataset into multiple subsets like train, val, and test:
      • random split
      • task-specific splits based on annotations, which keep initial label and attribute distributions
        • for classification task, based on labels
        • for detection task, based on bboxes
        • for re-identification task, based on labels, avoiding having same IDs in training and test splits
    • Sampling a dataset
      • analyzes inference result from the given dataset and selects the ‘best’ and the ‘least amount of’ samples for annotation.
      • Select the sample that best suits model training.
        • sampling with Entropy based algorithm
  • Dataset quality checking

    • Simple checking for errors
    • Comparison with model inference
    • Merging and comparison of multiple datasets
    • Annotation validation based on the task type(classification, etc)
  • Dataset comparison

  • Dataset statistics (image mean and std, annotation statistics)

  • Model integration

    • Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
    • Explainable AI (RISE algorithm)
      • RISE for classification
      • RISE for object detection

Check the design document for a full list of features. Check the user manual for usage instructions.

Contributing

(Back to top)

Feel free to open an Issue, if you think something needs to be changed. You are welcome to participate in development, instructions are available in our contribution guide.

Telemetry data collection note

The OpenVINO™ telemetry library is used to collect basic information about Datumaro usage.

To enable/disable telemetry data collection please see the guide.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datumaro-1.10.0rc0.tar.gz (573.1 kB view details)

Uploaded Source

Built Distributions

datumaro-1.10.0rc0-cp311-cp311-win_amd64.whl (968.3 kB view details)

Uploaded CPython 3.11 Windows x86-64

datumaro-1.10.0rc0-cp311-cp311-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

datumaro-1.10.0rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

datumaro-1.10.0rc0-cp310-cp310-win_amd64.whl (967.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

datumaro-1.10.0rc0-cp310-cp310-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

datumaro-1.10.0rc0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

datumaro-1.10.0rc0-cp39-cp39-win_amd64.whl (967.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

datumaro-1.10.0rc0-cp39-cp39-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

datumaro-1.10.0rc0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

File details

Details for the file datumaro-1.10.0rc0.tar.gz.

File metadata

  • Download URL: datumaro-1.10.0rc0.tar.gz
  • Upload date:
  • Size: 573.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datumaro-1.10.0rc0.tar.gz
Algorithm Hash digest
SHA256 9dc124875f8bf6d149a5bc7e79ac86a9c6186db8561d1a7c0c7d6f35b6d5418e
MD5 7a23de5aa4a8fb320455c71964ac9d04
BLAKE2b-256 a5fec9dbe9fc992a739467ad106c95ff7ec4bcd76967f40bb59454911ab54507

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e4774cf29b3d9aa70c7c728e58d955d84a6ff1fdac9e02554d91a10291f492d0
MD5 d1bcbbbbb53b8639efa2204c10567544
BLAKE2b-256 3fded3fea88aa64ea7cc9177d0483b972d1cc40fa1a6aa97ad542c34d5f17d37

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 838965a2e57d0006bd0dc3e241fbab8f87e3a5da529f3890df143434a64229b2
MD5 3b88dfbad1b3112b1e27dcb577a066bf
BLAKE2b-256 2cf7af36e3960ab3c0762a3933bd4e824c0d62c6ab415fff532763584bcb6d4f

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9465f6f01d29f9ea246a11257832aa1bd0b0dc6caadc0921d41b265651bf6b63
MD5 67a122a467bdbbf991c942752b12b0c9
BLAKE2b-256 aa917689e9ce61942a05104f41b710f9c8a09967c79770c10c3583a6fbecfe8b

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 850db45ca7a7b76cbd6760656d0a27332391667404b49a7fb41a0fa2d94fa9c3
MD5 30ae837c30bfadec9aa84cec932132d8
BLAKE2b-256 145049c0cbb67879a87416b08c79b1fe1bf7671e9ce51259d5fcd0d2376ab015

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 bbb5ec5944b0e53f87c114390389f4a372b072178545629ab77da02bb06c92a5
MD5 d27b561fdee5a7cc9c6d4abad22d941a
BLAKE2b-256 2f563595b931db7f9c3c7e62dedd110141e69a804198e98d46cb406eb71fe8ec

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c0db09672919e2f07c3500d5471846a2f16dfcae4058d6d41cbd57bea0168cca
MD5 1394c5fc7026c6b3c057695b2f2bde95
BLAKE2b-256 994e8d7ec9c755dfea85e7e72288bbce4a03d548e0c6f527f54cbee9b58317b0

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 cd53dbfa0c0c6599feb1a6877d205acaafe63c5621d400ce1dad3b58c029fd5a
MD5 d5e1c1ca957a689bf7563a64713dd0a3
BLAKE2b-256 4979576c81b1d5f840687d6ff9ee1dca82cce0725d2d22302b4f1a047d344a8f

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 a746589c1fdb150c9ffe4483dd26dd2f1169e598094d7ce47ac2ca936a2b779e
MD5 7952dabd5add9762df712ab6df83c0f4
BLAKE2b-256 64035a840ffd0d821cd9da578ac50002a95a5a53a5d5e541253b4bb08fc118de

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0rc0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0rc0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1dcdfa490335c0d56f2f233650b75ffc0ac20d5aec960fae16adeef12abb0c79
MD5 91065ae16eaf0e99865564fc4540f120
BLAKE2b-256 aa1ccbe1cda7341ffb8e905cff5b8bbfd757e5142a3ca160be987b0539a9da7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page