Skip to main content

Synthetic dataset insights.

Project description

Dataset Insights

PyPI python PyPI version Downloads Tests License

Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception package.


Datasetinsights is published to PyPI. You can simply run pip install datasetinsights command under a supported python environments:

Getting Started

Dataset Statistics

We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.

Load Datasets

The Unity Perception package provides datasets under this schema. The datasetinsighs package also provide convenient python modules to parse datasets.

For example, you can load AnnotationDefinitions into a python dictionary by providing the corresponding annotation definition ID:

from datasetinsights.datasets.unity_perception import AnnotationDefinitions

annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")

Similarly, for MetricDefinitions:

from datasetinsights.datasets.unity_perception import MetricDefinitions

metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")

The Captures table provide the collection of simulation captures and annotations. You can load these records directly as a Pandas DataFrame:

from datasetinsights.datasets.unity_perception import Captures

captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")

The Metrics table can store simulation metrics for a capture or annotation. You can also load these records as a Pandas DataFrame:

from datasetinsights.datasets.unity_perception import Metrics

metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")

Download Datasets

You can download the datasets using the download command:

datasetinsights download --source-uri=<xxx> --output=$HOME/data

The download command supports HTTP(s), and GCS.

Alternatively, you can download dataset directly from python interface.

GCSDatasetDownloader can download a dataset from GCS locations.

from import GCSDatasetDownloader

source_uri=gs://url/to/ # or gs://url/to/folder
dest = "~/data"
downloader = GCSDatasetDownloader(), output=dest)

HTTPDatasetDownloader can a dataset from any HTTP(S) url.

from import HTTPDatasetDownloader

dest = "~/data"
downloader = HTTPDatasetDownloader(), output=dest)

Convert Datasets

If you are interested in converting the synthetic dataset to COCO format for annotations that COCO supports, you can run the convert command:

datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Instances


datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Keypoints

You will need to provide 2D bounding box definition ID in the synthetic dataset. We currently only support 2D bounding box and human keypoint annotations for COCO format.


You can use the pre-build docker image unitytechnologies/datasetinsights to interact with datasets.


You can find the API documentation on readthedocs.


Please let us know if you encounter a bug by filing an issue. To learn more about making a contribution to Dataset Insights, please see our Contribution page.


Dataset Insights is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.


If you find this package useful, consider citing it using:

    title={Unity {D}ataset {I}nsights Package},
    author={{Unity Technologies}},

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasetinsights-1.1.2.tar.gz (1.5 MB view hashes)

Uploaded source

Built Distribution

datasetinsights-1.1.2-py3-none-any.whl (1.5 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page