Skip to main content

Synthetic dataset insights.

Project description

Dataset Insights

This repo enables users to understand their synthetic datasets by exposing the metrics collected when the dataset was created e.g. object count, label distribution, etc. The easiest way to use Dataset Insights is to run our jupyter notebook provided in our docker image unitytechnologies/datasetinsights

Requirements

The Dataset Insight notebooks assume that the user has already generated a synthetic dataset using the Unity Perception package. To learn how to create a synthetic dataset using Unity please see the perception documentation.

Running the Dataset Insights Jupyter Notebook Locally

You can either run the notebook by installing our python package or by using our docker image.

Running a Notebook Locally Using Docker

Requirements

Docker installed.

Steps

  1. Run notebook server using docker
docker run \
  -p 8888:8888 \
  -v $HOME/data:/data \
  -t unitytechnologies/datasetinsights:latest

This command mounts directory $HOME/data in your local filesystem to /data inside the container. If you are loading a dataset generated locally from a Unity app, replace this path with the root of your app's persistent data folder.

Example persistent data paths from SynthDet:

  • OSX: ~/Library/Application\ Support/UnityTechnologies/SynthDet
  • Linux: $XDG_CONFIG_HOME/unity3d/UnityTechnologies/SynthDet
  • Windows: %userprofile%\AppData\LocalLow\UnityTechnologies\SynthDet
  1. Go to http://localhost:8888 in a web browser to open the Jupyter browser.
  2. Open and run the example notebook in /datasetinsights/notebooks/ or create your own. (todo replace docker container gcr.io/unity-ai-thea-test/thea with public links)

Running a Dataset Insights Jupyter Notebook via Google Cloud Platform (GCP)

  • To run the notebook on GCP's AI platform follow these instructions and use the container unitytechnologies/datasetinsights:latest
  • Alternately, to run the notebook on kubeflow follow these steps

Download Dataset from Unity Simulation

Unity Simulation provides a powerful platform for running simulations at large scale. You can use the provided cli script to download Perception datasets generated in Unity Simulation:

python -m datasetinsights.scripts.usim_download \
  --data-root=$HOME/data \
  --run-execution-id=<run-execution-id> \
  --auth-token=<xxx>

The auth-token can be generated using the Unity Simulation CLI. This script will download the synthetic dataset for the requested run-execution-id.

If the --include-binary flag is present, the images will also be downloaded. This might take a long time, depending on the size of the generated dataset.

Download SynthDet Dataset

Download SynthDet public dataset from GCS, including GroceriesReal and Synthetic dataset. You can use the provided cli script to download public dataset to reproduce our work.

Here is the command line for GroceriesReal dataset download:

python -m datasetinsights.scripts.public_download \
  --name=GroceriesReal \
  --data-root=$HOME/data \

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasetinsights-0.2.0b5.tar.gz (510.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasetinsights-0.2.0b5-py3-none-any.whl (540.4 kB view details)

Uploaded Python 3

File details

Details for the file datasetinsights-0.2.0b5.tar.gz.

File metadata

  • Download URL: datasetinsights-0.2.0b5.tar.gz
  • Upload date:
  • Size: 510.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.3.0-1035-azure

File hashes

Hashes for datasetinsights-0.2.0b5.tar.gz
Algorithm Hash digest
SHA256 cebd80e220401b4a6e54056456474fd8c1add163747fab7e1a6b4271f673b68f
MD5 58bf47678eeb200d84e63426f238f970
BLAKE2b-256 2cc657d8dfec4bb471ac55e7e3bd12ada0b4c244e4ee45e2ed81484b84a48b8f

See more details on using hashes here.

File details

Details for the file datasetinsights-0.2.0b5-py3-none-any.whl.

File metadata

  • Download URL: datasetinsights-0.2.0b5-py3-none-any.whl
  • Upload date:
  • Size: 540.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.3.0-1035-azure

File hashes

Hashes for datasetinsights-0.2.0b5-py3-none-any.whl
Algorithm Hash digest
SHA256 5caa64219d471ddca17a4ff062ca6c2d46a880fa8337899142e5e264d3517b09
MD5 206a5889625d2240b07955a767c1d0cb
BLAKE2b-256 3afc767f12630061437d6c25a22e6befbf90f81d7c0cd0338eed9dcff97f47a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page