Scoring and metrics app
Project description
Scoring and metrics app
Description
This app improves the efficiency of annotating data by improving annotation quality while reducing the time required to produce them.
The components of this app are:
- Functions to calculate scores for quality tasks and model predictions.
- Custom nodes that can be added to pipelines to calculate scores when a task's quality items are completed.
Also check this notebook for more information
Quality Flows
To understand more about each of these tasks, refer to the main Dataloop documentation linked:
In general, an annotator will receive an assignment to complete their annotation task. For a given item in a consensus task, each assignment will be cross-compared with every other assignment. In the case of qualification and honeypot tasks, each item will only have one assignment associated with it.
What's Supported?
Supported file types:
- image
- video
Scoring is currently supported for quality tasks with the following annotation types (with geometry score method in parentheses, where applicable):
- classification
- bounding box (IOU)
- polygon (IOU)
- segmentation (IOU)
- point (distance)
Score Types
During scoring, the following scores will be created for each annotation:
raw_annotation_scores
- for each annotation comparison we havegeometry
,label
andattribute
matching scoresannotation_overall
- the mean of each annotation’s raw scoresuser_confusion_score
- the mean of every annotation overall score, relative to ref or another assigneeitem_confusion_score
- the count of the number of label pairs associated with the assignee’s label, relative to the reference’s labelitem_overall_score
- the mean value of each annotation overall score associated with an item
1) Raw annotation scores:
There are three types of scores for annotations: annotation_iou
, annotation_label
and annotation_attribute
.
These scores can be determined by the user, and the default is to include all three scores, and the default value is 1 (which can be modified).
2) Annotation overall
For annotation_overall
score we calculate the mean value for all raw annotation scores per annotation.
3) User confusion score
The user_confusion
score represents the mean annotation score a given assignee has, relative to raw scores when comparing it to another set of annotations (either the reference or another assignee).
4) Label confusion score
The label_confusion
score represents the count for a label annotated by a given assignee, relative to label each label class in the other set of annotations (either reference or another assignee).
5) Item overall score
The item_overall
score is the mean value of all annotations associated with an item, averaging the mean overall annotation score.
Any calculated and uploaded scores will replace any previous scores for all items of a given task.
Note about videos: Video scores will differ slightly from image scores. Video sores are calculated frame by frame, and then specific annotation scores will be the average of these scores across all relevant frames for that specific annotation. Confusion scores are not calculated due to the multi-frame nature of videos. Item overall scores remain an average of all annotations of the video item.
Confusion Example
There are generally two kinds of scores: regular scores, and “confusion” scores.
Regular scores show the level of agreement or overlap between two sets of annotations. They use the ID of the entities being compared for the entityID
and relative
fields. This can be for comparing annotations or items. value
will typically be a number between 0 and 1.
There are two types of confusion scores: item label confusion, and user confusion. Item label confusion shows the number of instances in which an assignee’s label corresponds with the ground truth labels.
Ground truth annotations:
item = dl.items.dl(item_id='64c0fc0730b03f27ca3a58db')
Assignee annotations:
item = dl.items.dl(item_id='64c0f2e1ec9103d52eaedbe2')
In this example item, the ground truth has 3 for each cat and dog class. The assignee however, labels 1 as cat and 5 as dog. This would result in the following item label confusion scores:
{
"type": "label_confusion",
"value": 1,
"entityId": "cat",
"context": {
"relative": "cat",
"taskId": "<TASK_ID>",
"itemId": "<ITEM_ID">,
"datasetId": "<DATASET_ID>"
}
},
{
"type": "label_confusion",
"value": 3,
"entityId": "dog",
"context": {
"relative": "dog",
"taskId": "<TASK_ID>",
"itemId": "<ITEM_ID">,
"datasetId": "<DATASET_ID>"
}
},
{
"type": "label_confusion",
"value": 2,
"entityId": "dog",
"context": {
"relative": "cat",
"taskId": "<TASK_ID>",
"itemId": "<ITEM_ID">,
"datasetId": "<DATASET_ID>"
}
}
Python installation
pip install dtlpymetrics
Functions
See this page for details on additional functions.
Contributions, Bugs and Issues - How to Contribute
We welcome anyone to help us improve this app.
Here are detailed instructions to help you open a bug or ask for a feature request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file dtlpymetrics-1.0.108-py3-none-any.whl
.
File metadata
- Download URL: dtlpymetrics-1.0.108-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.8.3 requests/2.28.1 setuptools/57.4.0 requests-toolbelt/0.10.1 tqdm/4.64.1 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d84c2f18f56db98edee40ab5fce2c92192aa9b4dcb07ed14be91157d047b6476 |
|
MD5 | 9f33776aa87e33ae787e4e58730c4dcf |
|
BLAKE2b-256 | 25a2c67815ef6762b7b804530850b4ae3d7c653fa32ffb144d264e28b115a303 |