Project description

Converter from AWS SageMaker Ground Truth Label Job Manifest file to VGG Image Annotator

Context

AWS SageMaker Ground Truth Label Job (GLT)

https://docs.aws.amazon.com/sagemaker/index.html

VGG Image Annotator (VIA)

project: http://www.robots.ox.ac.uk/~vgg/software/via/
code repo: https://gitlab.com/vgg/via/-/tree/master

Why do we need this?

The AWS SageMaker Ground Truth labelling jobs are very nice, especially when we want to out source the tedious labelling job to prepare for machine learning trainings. It also supports the job of adjust existing boxes, where some code/algorithm is already generating the bounding boxes for certain objects. In this case, we could use the Ground Truth jobs to view existing boxes and adjust them. However, the AWS system currently do not offer an easy way to preview all the existing boxes until all the objects are manually labelled (there is a $0.08/object charge for each labelling operation and it makes business for AWS to want to encourage that).

VIA offers a very powerful way to view all/multiple/selective images with their annotations and it works on any platform out of the box without much installation (just checkout the code and launch the index.html using broswer).

The idea here is to take a already generated AWS SageMaker manifest file and convert it into the annotation json file used by VIA.

Usage

usage: gtl2via.py [-h] -i INPUT -l LABEL -o OUTPUT -s S3_BUCKET

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        input manifest file for AWS SageMaker GTL job
  -l LABEL, --label LABEL
                        use the label from the manifest file
  -o OUTPUT, --output OUTPUT
                        output json file of VIA
  -s S3_BUCKET, --s3-bucket S3_BUCKET
                        the S3 bucket used by AWS SageMaker (standard AWS partition only, ie, not working for China or US-Gov). It will be replaced to the corresponding https URL

Example

python gtl2via.py  -i test/defect_boxes.manifest -o test/via.json -l zerobox-quick-detection -s zerobox-public

The input manifest file looks like this:

seki@seki-Surface-Book:~/src/gtl2via$ cat test/defect_boxes.manifest |jq
{
  "source-ref": "s3://zerobox-public/test/2020-08-13/good/Camera0_202008131130183_original.png",
  "zerobox-quick-detection": {
    "annotations": [
      {
        "class_id": 0,
        "width": 519,
        "top": 896,
        "height": 1024,
        "left": 512
      }
    ],
    "image_size": [
      {
        "width": 1080,
        "depth": 3,
        "height": 1920
      }
    ]
  },

The output json looks like this

seki@seki-Surface-Book:~/src/gtl2via$ cat test/via.json |jq
{
  "https://zerobox-public.s3.amazonaws.com/test/2020-08-13/good/Camera0_202008131130183_original.png-1": {
    "filename": "https://zerobox-public.s3.amazonaws.com/test/2020-08-13/good/Camera0_202008131130183_original.png",
    "size": -1,
    "regions": [
      {
        "shape_attributes": {
          "name": "rect",
          "x": 512,
          "y": 896,
          "width": 519,
          "height": 1024
        },
        "region_attributes": {}
      }
    ],
    "file_attributes": {}
  },

Loading the annoation json file into VIA-2 and we see a result like this

Known Limitations (ToDo)

It only works with S3 in aws (Standard Regions), not for aws-cn (China Regions) or aws-us-gov (AWS GovCloud [US] Regions).
It only works with one annotation type: rect
No choice of box color yet for different classes in SageMaker

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.1

Aug 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtl2via-0.0.1.tar.gz (4.0 kB view hashes)

Uploaded Aug 13, 2020 Source

Built Distribution

gtl2via-0.0.1-py3-none-any.whl (6.0 kB view hashes)

Uploaded Aug 13, 2020 Python 3

Hashes for gtl2via-0.0.1.tar.gz

Hashes for gtl2via-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0e5197536b9dd0d59779029c6035e01ddb9baef61251440236e2facb16b4b882`
MD5	`817c1c4b9c7dc4db1e53955d9c9e7fc8`
BLAKE2b-256	`44d7e16e6476a5fbf1c2170d79da3b748816432bddd0b13a50aab29cbc19be74`

Hashes for gtl2via-0.0.1-py3-none-any.whl

Hashes for gtl2via-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ba9336a35a26a699cfbbd2963781d79fe91faeb7cee52d270c00d6afd9af23a`
MD5	`fd9a9fafd19eec69ac7e0b31ed318af3`
BLAKE2b-256	`d7570ea4d201f5832864bed0af1e23e64a5121322a80ac614867c2949365b04a`