Skip to main content

Dataset Management Framework (Datumaro)

Project description

Dataset Management Framework (Datumaro)

Build status codecov

A framework and CLI tool to build, transform, and analyze datasets.

VOC dataset                                  ---> Annotation tool
     +                                     /
COCO dataset -----> Datumaro ---> dataset ------> Model training
     +                                     \
CVAT annotations                             ---> Publication, statistics etc.

Features

(Back to top)

  • Dataset reading, writing, conversion in any direction.

    Other formats and documentation for them can be found here.

  • Dataset building

    • Merging multiple datasets into one
    • Dataset filtering by a custom criteria:
      • remove polygons of a certain class
      • remove images without annotations of a specific class
      • remove occluded annotations from images
      • keep only vertically-oriented images
      • remove small area bounding boxes from annotations
    • Annotation conversions, for instance:
      • polygons to instance masks and vice-versa
      • apply a custom colormap for mask annotations
      • rename or remove dataset labels
    • Splitting a dataset into multiple subsets like train, val, and test:
      • random split
      • task-specific splits based on annotations, which keep initial label and attribute distributions
        • for classification task, based on labels
        • for detection task, based on bboxes
        • for re-identification task, based on labels, avoiding having same IDs in training and test splits
    • Sampling a dataset
      • analyzes inference result from the given dataset and selects the ‘best’ and the ‘least amount of’ samples for annotation.
      • Select the sample that best suits model training.
        • sampling with Entropy based algorithm
  • Dataset quality checking

    • Simple checking for errors
    • Comparison with model inference
    • Merging and comparison of multiple datasets
    • Annotation validation based on the task type(classification, etc)
  • Dataset comparison

  • Dataset statistics (image mean and std, annotation statistics)

  • Model integration

    • Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
    • Explainable AI (RISE algorithm)
      • RISE for classification
      • RISE for object detection

Check the design document for a full list of features. Check the user manual for usage instructions.

Contributing

(Back to top)

Feel free to open an Issue, if you think something needs to be changed. You are welcome to participate in development, instructions are available in our contribution guide.

Telemetry data collection note

The OpenVINO™ telemetry library is used to collect basic information about Datumaro usage.

To enable/disable telemetry data collection please see the guide.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datumaro-1.10.0.tar.gz (567.7 kB view details)

Uploaded Source

Built Distributions

datumaro-1.10.0-cp312-cp312-win_amd64.whl (983.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

datumaro-1.10.0-cp312-cp312-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.1+ x86-64

datumaro-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

datumaro-1.10.0-cp311-cp311-win_amd64.whl (982.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

datumaro-1.10.0-cp311-cp311-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

datumaro-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

datumaro-1.10.0-cp310-cp310-win_amd64.whl (981.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

datumaro-1.10.0-cp310-cp310-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

datumaro-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

datumaro-1.10.0-cp39-cp39-win_amd64.whl (980.8 kB view details)

Uploaded CPython 3.9 Windows x86-64

datumaro-1.10.0-cp39-cp39-musllinux_1_1_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

datumaro-1.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

File details

Details for the file datumaro-1.10.0.tar.gz.

File metadata

  • Download URL: datumaro-1.10.0.tar.gz
  • Upload date:
  • Size: 567.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.9

File hashes

Hashes for datumaro-1.10.0.tar.gz
Algorithm Hash digest
SHA256 408a07fb4c74a2d832d4493c9c8001283cc0f40b02fdddc26b9e05ffb6ee8477
MD5 86a4e00815451af10d369a631f2dba05
BLAKE2b-256 5062a915845b2d650ec2e2dc9f07716872228f898ff4ee7c45a729304295b3b7

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a7a3eed3127d6b1ba4acbf376f6dd30a48bc932f25144a946dcf0b74a12c2cea
MD5 f2ac0c67947371b00650da1d9a332b39
BLAKE2b-256 f85495da793fcc48a0824c3d0f5163beac199e0ec8e0d476adfc3c9a9577394d

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp312-cp312-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f16b4d5b25e3127b7dc921a4d222b1379c989cabb508132ff9263d764f2ecb47
MD5 014a4aeccf9aa5410df96302dde6257a
BLAKE2b-256 8450bc5e733068386acbd4c065296e4b648886c7b75cfb4a5f40189c5d80b270

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 351b05d72caa2cef09c728fe0a1a57ab7be3dd1b5ee4fe4a4c7ff1e30cdfa81d
MD5 eb4525222a7e5af8965d885474e87f77
BLAKE2b-256 c4d6b1e5239397dbf2eb87117d2a1ed308dd5d54c88843d1fa09287618c6971b

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 296d3f43b873a9b7eee21ed73381d22610750c17087e8f93020220cb990e30ec
MD5 bd71cdfd83b910b763339b75904c96da
BLAKE2b-256 ccceef0557f2cabcc46ee87b8162a0ed3f3fcf32fcf9227a6d69d6775de122e5

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 b78545c0b82ceb5a4d163f539f3385272da49aa210e8171ca8f2b3c9ff78e598
MD5 52da9ce86f7d5269c28d074f96e39486
BLAKE2b-256 50c50198c258789771da60c7753f51fa872064775065b01827295f8e0ac7db9f

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 261be33858ae253c5f286430a347fa19aa53091d710f3cae599005b0acf1422b
MD5 90e59f6b03786d927c55da2bf3780bd6
BLAKE2b-256 86be2b3f0a1b0bf1d836739f30140d5a4d9fe88adc065be350be5819dc62b15b

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2823e930c1e4ca2c75046354d6214401498eff5f3c27c63d7a62d84958be56d5
MD5 20f42387af8df5d756b63f9cf4916b23
BLAKE2b-256 8446035835142298a8eacea7bdc19a7b0949a7d5d33a480814666741c39a63ba

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d2f41647ed54bb09ca306614203ae90f69e65e89e18d7c36ed0f75c69278ee6a
MD5 5e2651c0ffc256fbc94bcac47bf71451
BLAKE2b-256 fca13aa0975be54a51480f83a6f7315e2d6dc31b1b446847faa4cc34589db2f8

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1e7cee5d990cf85c246e33b8dfe21a67298fbb2a8886aa5607d97924087a0b97
MD5 2f4da96dfc6d566a56e107e2a8329580
BLAKE2b-256 46623bb463635ebac05b2603416e7c79e56ac6af7c401056246ab448dc09c8f8

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: datumaro-1.10.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 980.8 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.9

File hashes

Hashes for datumaro-1.10.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2a4cd044f95342813274eed634506cf728b89765602332e04a4f04174953dfc1
MD5 73a07b99ac835d791ba28ecfc880ba61
BLAKE2b-256 238fcddfd36f83d01512171befcec1f0b7feb627c872c5875822d93f6bed819a

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 270fe548382190fa6d08adc2608211774f8c55678d5ba4fc949249e9e698c18a
MD5 e4b40e48355477e3780085dfa6c19a0c
BLAKE2b-256 ef79573b0d793a6ff943fdb4af8a3becc47af8104cb947ae183c363922a1812a

See more details on using hashes here.

File details

Details for the file datumaro-1.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datumaro-1.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e849950cfa222351d7830e089fcee9ac304e0d798cfd7d17b012136d156d444
MD5 fcc51d146f30d547e115d251018b84b8
BLAKE2b-256 4d2b437d982cf11dc13ef3bc202ecae98213358df5cf309b0960100a308e2aff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page