Skip to main content

WAI: World AI format unifying various 3D/4D datasets

Reason this release was yanked:

deprecated

Project description

Core library for the WorldAI format (WAI)

Core functionality to access and process data in the WorldAI format (WAI). Lightweight and easy to use, with a focus on modularity and extensibility.

Features

  • IO functions for reading and writing WAI data
  • Intersection checks between 2d/3d geometric primitives
  • Batched camera and geometric processing functions (e.g. (un)projection, transformations, etc.)

Install

We recommend using python 3.12.9. WAI is likely to work also with other python versions, but was only tested on 3.12.9. Pip install (recommended):

pip install wai-core

Clone and install from source (for development)

git clone https://ghe.oculus-rep.com/mb-research/wai-core.git
cd wai-core
pip install -e ".[dev]"

WAI dataset format

Folder structure

We follow Nerfstudio’s folder structure and extend it by additional (optional) modalities. The general folder and file structure is as follows:

<dataset name>
├── <first scene name>
│   ├── scene_meta.json   OR   scene_meta_distorted.json
│   │
│   ├── images   OR   images_distorted
│   │   ├── <frame_id_1>.[png|jpg]
│   │   :
│   │   └── <frame_id_n>.[png|jpg]
│   │
│   ├── [Optional] <depth> (GT depth, as specified in scene_meta.json)
│   │   ├── <frame_id_1>.exr
│   │   :
│   │   └── <frame_id_n>.exr
│   │
│   ├── [Optional] masks
│   │   ├── <frame_id_1>.png
│   │   :
│   │   └── <frame_id_n>.png
│   │
│   └── [Optional] Any extra modalities, as specified in scene_meta.json
:
└── <last scene name>

scene_meta format

The scene_meta.json format is an extension of Nerfstudio's transforms.json.

The general structure is:

{
  "scene_name": "00dd871005", // Unique scene name
  "dataset_name": "scannetppv2", // Unique dataset name
  "version": "0.1", // WAI format version
  "last_modified": "2025-02-10T09:43:48.232022", // ISO datetime format
  "shared_intrinsics": true/false, // Same intrinsics for all cameras?

  // Camera model type [PINHOLE, OPENCV, OPENCV_FISHEYE]
  "camera_model": "PINHOLE",

  // Convention for cam2world extrinsics (must be "opencv")
  "camera_convention": "opencv",

  // <camera_coeff_name>: // camera coefficients like fl_x, cx, h,...

  // Per-frame intrinsics and extrinsics parameters
  "frames": <see below>,

  // Scene-level modalities with default mapping like gt_points3D -> pts3d.npy
  "scene_modalities": <see below>,

  // Frame-level modalities with default mapping like pred_depth -> metric3dv2
  "frame_modalities": <see below>,

  // Transform applied on original poses to get poses stored in `frames`
  "_applied_transform": <see below>,

  // All transforms to convert the original poses to poses stored in `frames`
  "_applied_transforms": <see below>,
}

Per-frame intrinsics can also be defined in the frames field, which is a list of dictionaries with the following structure:

{
  "frames": [
    {
      // Unique name to identify a frame
      "frame_name": "<frame_name>",

      // Relative path to frame, required for Nerfstudio compatibility
      "file_path": "<images>/<frame_name>.<ext>",

      // 4x4 flattened list of extrinsics in OpenCV format
      "transform_matrix": [[1, 0, 0, 0], ... [0, 0, 0, 1]],

      // Relative path to frame modality (optional)
      "<modality>_path": "<modality_path>/<frame_name>.<ext>",

      // Additional intrinsics for this frame (optional)
      "camera_model": "PINHOLE",
      // <camera_coeff_name>: // camera coefficients like fl_x, cx, h,...
    },
    ...
  ]
}

Example:

{
  "scene_name": "00dd871005", // unique scene name
  "dataset_name": "scannetppv2", // unique dataset name
  "version": "0.1",
  "last_modified":  "2025-02-10T09:43:48.232022",
  "camera_model": "PINHOLE", // camera model type [PINHOLE, OPENCV, OPENCV_FISHEYE]
  "camera_convention": "opencv", // camera convention used for cam2world extrinsics (different to Nerfstudio!)
  "fl_x": 1072.0, // focal length x
  "fl_y": 1068.0, // focal length y
  "cx": 1504.0, // principal point x
  "cy": 1000.0, // principal point y
  "w": 3008, // image width
  "h": 2000, // image height
  "frames": [
    {
      "frame_name": "000000",
      "file_path": "images/000000.png", // required by Nerfstudio
      "transform_matrix": [
        [1.0, 0.0, 0.0, 0.0],
        [0.0, 1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0]
      ], // required by Nerfstudio
      "image": "images/000000.png", // same as file_path
      "metric3dv2_depth": "metric3dv2/v0/depth/000000.png",
      "metric3dv2_depth_conf": "metric3dv2/v0/depth_confidence/000000.exr",
      "fl_x": 1234, // specific focal length for this frame
      "w": 1000 // specific width for this frame
    },
    ...
  ],
  "scene_modalities": {
    "gt_pts3d": {
        "path": "global_pts3d.npy", //path to a scene_level point cloud
        "format": "numpy"
    }
  },
  "frame_modalities": {
    "pred_depth": {
        "frame_key": "metric3dv2_depth", //default mapping of pred_depth to frame modality
        "format": "depth"
    },
    "image": {
        "frame_key": "image", //default mapping of pred_depth to modality
        "format": "image",
    },
    "depth_confidence": {
        "frame_key": "metric3dv2_depth_conf",
        "format": "scalar"
    }
  },
  "_applied_transformation": [ // e.g. the transformation from opengl to opencv
        [1.0, 0.0, 0.0, 0.0],
        [0.0, -1.0, 0.0, 0.0],
        [0.0, 0.0, -1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0]
  ],
  "_applied_transformations": {
    "opengl2opencv": [ // e.g. applied from OpenGL to OpenCV before
        [1.0, 0.0, 0.0, 0.0],
        [0.0, -1.0, 0.0, 0.0],
        [0.0, 0.0, -1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0]
  ]}
}

Standard formats (WIP!)

We provide a set of default read-/write functions for the following data types:

  • image: using pillow for reading/writing
  • depth: depth images stored as exr-files using opencv
  • normals: normals images stored as exr-files using opencv
  • scalar: scalar images stored as exr-files using opencv
  • readable: json/yaml using orjson/yaml for reading/writing
  • numpy: load stored (compressed) numpy arrays
  • binary: binary mask stored as 1-bit or 8-bit image
  • ptz: load/store compressed Pytorch files
  • mmap: memory-mapped numpy array for fast random-access
  • scene_meta: load and store our default scene-level info file (see scene_meta format)

All these formats can be conveniently loaded and stored using load_data, store_data:

from wai import load_data, store_data

# --- load modalities ---
scene_meta = load_data(<path_to_scene_meta.json>, "scene_meta")
info = load_data(<path_to_readable.json>)
image = load_data(<path_to_image.png>) # suffix resolves format to "image"
depth = load_data(<path_to_depth_image.exr>, "depth")
confidences = load_data(<path_to_image.exr>, "scalar")
normals = load_data(<path_to_image.exr>, "normals")
mask = load_data(<path_to_mask.png>, "binary")
array = load_data(<path_to_image.npy>, "numpy")
data = load_data(<path_to_ptz.ptz>) # suffix resolves format to "ptz"
# ...

# --- store modalities ---
store_data(<target_path.json>, scene_meta, "scene_meta")
store_data(<target_path.json>, info) # default for json: readable
store_data(<target_path.png>, image) # default for png: 'image'
store_data(<target_path.exr>, depth_img, "depth")
store_data(<target_path.exr>, confidences, "scalar")
store_data(<target_path.exr>, normals, "normals")
store_data(<target_path.png>, mask, "binary")
store_data(<target_path.npy>, array, "numpy")
store_data(<target_path.ptz>, data) # suffix resolves format to "ptz"
# ...

Implementation details can be found in utils.io.

Create a simple pytorch dataset using WAI

We provide a basic dataset to get you started with training your models with WAI datasets. This is a minimal example on how to construct a WAI dataset:

from wai import BasicSceneframeDataset
from box import Box

cfg = Box(
    {
        "root": "/fsx/xrtech/data/scannetppv2",
        "frame_modalities": ["image", "pred_depth"],    
    }
)
dataset = BasicSceneframeDataset(cfg)
item = dataset[0]
print(f"Fetched an item from the WAI dataset with the following keys: {item.keys()}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wai_core-1.0.4.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wai_core-1.0.4-py3-none-any.whl (59.6 kB view details)

Uploaded Python 3

File details

Details for the file wai_core-1.0.4.tar.gz.

File metadata

  • Download URL: wai_core-1.0.4.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for wai_core-1.0.4.tar.gz
Algorithm Hash digest
SHA256 943cbde9ea0530ffcfd8e4ddfa110fefdcbb67bbff55b0b564e3b97c5a0e1fd4
MD5 25923399d954a3e5457a3e22e26defe0
BLAKE2b-256 83fe57b42193d4dad725dbdcf3cbd2a65af9b780959cb0aaf708d94b55f04be9

See more details on using hashes here.

File details

Details for the file wai_core-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: wai_core-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for wai_core-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dd1a04d203bc65d6ffd1df9c65ea4a12d3cf39c49724dc2f115c78fdb1db8afc
MD5 4af9437ef9da8c233fb92140ba901d5b
BLAKE2b-256 899c6d9f8b1e1fa10e24efeb6075470a3b58f77b3c5ff3032af62fa808cb5a1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page