Skip to main content

Dataset Management Framework (Datumaro)

Project description

Dataset Management Framework (Datumaro)

Build status codecov

A framework and CLI tool to build, transform, and analyze datasets.

VOC dataset                                  ---> Annotation tool
     +                                     /
COCO dataset -----> Datumaro ---> dataset ------> Model training
     +                                     \
CVAT annotations                             ---> Publication, statistics etc.

Features

(Back to top)

  • Dataset reading, writing, conversion in any direction.

    Other formats and documentation for them can be found here.

  • Dataset building

    • Merging multiple datasets into one
    • Dataset filtering by a custom criteria:
      • remove polygons of a certain class
      • remove images without annotations of a specific class
      • remove occluded annotations from images
      • keep only vertically-oriented images
      • remove small area bounding boxes from annotations
    • Annotation conversions, for instance:
      • polygons to instance masks and vice-versa
      • apply a custom colormap for mask annotations
      • rename or remove dataset labels
    • Splitting a dataset into multiple subsets like train, val, and test:
      • random split
      • task-specific splits based on annotations, which keep initial label and attribute distributions
        • for classification task, based on labels
        • for detection task, based on bboxes
        • for re-identification task, based on labels, avoiding having same IDs in training and test splits
    • Sampling a dataset
      • analyzes inference result from the given dataset and selects the ‘best’ and the ‘least amount of’ samples for annotation.
      • Select the sample that best suits model training.
        • sampling with Entropy based algorithm
  • Dataset quality checking

    • Simple checking for errors
    • Comparison with model inference
    • Merging and comparison of multiple datasets
    • Annotation validation based on the task type(classification, etc)
  • Dataset comparison

  • Dataset statistics (image mean and std, annotation statistics)

  • Model integration

    • Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
    • Explainable AI (RISE algorithm)
      • RISE for classification
      • RISE for object detection

Check the design document for a full list of features. Check the user manual for usage instructions.

Contributing

(Back to top)

Feel free to open an Issue, if you think something needs to be changed. You are welcome to participate in development, instructions are available in our contribution guide.

Telemetry data collection note

The OpenVINO™ telemetry library is used to collect basic information about Datumaro usage.

To enable/disable telemetry data collection please see the guide.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datumaro-1.9.1.tar.gz (569.3 kB view details)

Uploaded Source

Built Distributions

datumaro-1.9.1-cp311-cp311-win_amd64.whl (962.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

datumaro-1.9.1-cp311-cp311-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

datumaro-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

datumaro-1.9.1-cp310-cp310-win_amd64.whl (961.3 kB view details)

Uploaded CPython 3.10 Windows x86-64

datumaro-1.9.1-cp310-cp310-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

datumaro-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

datumaro-1.9.1-cp39-cp39-win_amd64.whl (961.5 kB view details)

Uploaded CPython 3.9 Windows x86-64

datumaro-1.9.1-cp39-cp39-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

datumaro-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

File details

Details for the file datumaro-1.9.1.tar.gz.

File metadata

  • Download URL: datumaro-1.9.1.tar.gz
  • Upload date:
  • Size: 569.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datumaro-1.9.1.tar.gz
Algorithm Hash digest
SHA256 a2b3dbd54ced6b2da6c882a38f21f29de3219e14dd362c4b63902a85278f21f5
MD5 5fe9df5c91d1459aa1c55d5372d96cc3
BLAKE2b-256 6867345c3aa37bb827ea228a6c2236c2e640e7d2fed40deb5bad34f45ac3bb6a

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: datumaro-1.9.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 962.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datumaro-1.9.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 205b8d83b51486ecf72e3cc8ee0200139f4e8bd9a9cceb861fd067c0d5c3751f
MD5 b38b282f5a90de888c830889d17f19a6
BLAKE2b-256 89f5f33098388e8353b046aa87e945e30b68f69ffa944eb9045637742c4dce8c

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 5da9d23d9229f37ae11f3bf0ee039b7936eb53742b763772e1675308744dc691
MD5 52bf4a45a2b8d747b1ae8de203f45a2f
BLAKE2b-256 a711aa9dea18c498606bcdd1e846c0e152da5f1793d4ef52b84542a3a7869a84

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd7c19fee15908769b179c7eb8c3ce09dc00882b7c2f7ba8fb542a1ce8b2ded8
MD5 1513fe9b9e797d73132f1396bf9b5968
BLAKE2b-256 880844aecfa14d0c7a9dc40c54ac7c75b1b9e396aff76455cabaeb67c73beeac

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: datumaro-1.9.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 961.3 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datumaro-1.9.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8f0646009055193e24939d8dc9e41a0ac02d34734f3d08eacfd57e0d6516edb5
MD5 8f8339613b8fbc998f349be7f2056bf0
BLAKE2b-256 5e035a4be7c081f5d7c0b7da03484d6d165a34c3d9d802179a8f617d3b372ca4

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d4cf82fc495c377e39042cd82f983a206ab49b63baa5645b1e5ec90886536a9b
MD5 4982c5134f8061ed66903f8a13790480
BLAKE2b-256 3c25ffa9947837abc7d490d581fc1aad4ce3fab792244b31ab56f7557ab0bcd2

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 baf277697ff0de0bb44929b91e423aa2a6e08c00e494982674e8c99a6419bb8a
MD5 8591fb872ac3715746b4da1838af4525
BLAKE2b-256 27fee9b2093a1c2c3c43cb5f7a1d72e4aa771c533a1cfd6e162ee6b238ab7456

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: datumaro-1.9.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 961.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for datumaro-1.9.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 15fe86e521f72916f5aa5abb05fab41600acd48e77e07e46d0313ca221bf22b6
MD5 2629079a73c3547000da8e1d797f363a
BLAKE2b-256 967e5d9b9634579b7199e596c72d3c486e00e10e822b375714c929eb9d92c966

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 3e116750a0f3cc78fa9b5305e70b201818765c07fc97758d136746fe737bb693
MD5 5305a388cf65cca1650313e4766c329d
BLAKE2b-256 d47f54eebdd2ec9f225a1a41472606cd394617774bcfc9eb2e929049e4058eb0

See more details on using hashes here.

File details

Details for the file datumaro-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 493bf9995675c4657f40cab6665b389676dc80088be7b888a0e02cf9eae65538
MD5 a31a6ddd5f8807523323c01934acc632
BLAKE2b-256 ee2a79b07e1d53537b9fb20bc61070a863e9507ef6e985d86265107c498a71b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page