Skip to main content

Converter from AWS SageMaker tto VIA

Project description

Converter from AWS SageMaker Ground Truth Label Job Manifest file to VGG Image Annotator

Context

AWS SageMaker Ground Truth Label Job (GLT)

VGG Image Annotator (VIA)

Why do we need this?

The AWS SageMaker Ground Truth labelling jobs are very nice, especially when we want to out source the tedious labelling job to prepare for machine learning trainings. It also supports the job of adjust existing boxes, where some code/algorithm is already generating the bounding boxes for certain objects. In this case, we could use the Ground Truth jobs to view existing boxes and adjust them. However, the AWS system currently do not offer an easy way to preview all the existing boxes until all the objects are manually labelled (there is a $0.08/object charge for each labelling operation and it makes business for AWS to want to encourage that).

VIA offers a very powerful way to view all/multiple/selective images with their annotations and it works on any platform out of the box without much installation (just checkout the code and launch the index.html using broswer).

The idea here is to take a already generated AWS SageMaker manifest file and convert it into the annotation json file used by VIA.

Usage

usage: gtl2via.py [-h] -i INPUT -l LABEL -o OUTPUT -s S3_BUCKET

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        input manifest file for AWS SageMaker GTL job
  -l LABEL, --label LABEL
                        use the label from the manifest file
  -o OUTPUT, --output OUTPUT
                        output json file of VIA
  -s S3_BUCKET, --s3-bucket S3_BUCKET
                        the S3 bucket used by AWS SageMaker (standard AWS partition only, ie, not working for China or US-Gov). It will be replaced to the corresponding https URL

Example

python gtl2via.py  -i test/defect_boxes.manifest -o test/via.json -l zerobox-quick-detection -s zerobox-public

The input manifest file looks like this:

seki@seki-Surface-Book:~/src/gtl2via$ cat test/defect_boxes.manifest |jq
{
  "source-ref": "s3://zerobox-public/test/2020-08-13/good/Camera0_202008131130183_original.png",
  "zerobox-quick-detection": {
    "annotations": [
      {
        "class_id": 0,
        "width": 519,
        "top": 896,
        "height": 1024,
        "left": 512
      }
    ],
    "image_size": [
      {
        "width": 1080,
        "depth": 3,
        "height": 1920
      }
    ]
  },

The output json looks like this

seki@seki-Surface-Book:~/src/gtl2via$ cat test/via.json |jq
{
  "https://zerobox-public.s3.amazonaws.com/test/2020-08-13/good/Camera0_202008131130183_original.png-1": {
    "filename": "https://zerobox-public.s3.amazonaws.com/test/2020-08-13/good/Camera0_202008131130183_original.png",
    "size": -1,
    "regions": [
      {
        "shape_attributes": {
          "name": "rect",
          "x": 512,
          "y": 896,
          "width": 519,
          "height": 1024
        },
        "region_attributes": {}
      }
    ],
    "file_attributes": {}
  },

Loading the annoation json file into VIA-2 and we see a result like this image

Known Limitations (ToDo)

  • It only works with S3 in aws (Standard Regions), not for aws-cn (China Regions) or aws-us-gov (AWS GovCloud [US] Regions).
  • It only works with one annotation type: rect
  • No choice of box color yet for different classes in SageMaker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtl2via-0.0.1.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

gtl2via-0.0.1-py3-none-any.whl (6.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page