Skip to main content

Official project for GLUE3D.

Project description

GLUE3D: General Language Understanding Evaluation for 3D Point Clouds

Giorgio Mariani, Alessandro Raganato, Simone Melzi, Gabriella Pasi

Official implementation of GLUE3D: General Language Understanding Evaluation for 3D Point Clouds.

GLUE3D is a Q&A benchmark for evaluation of 3D-LLMs object understanding capabilities. It is built around 128 richly textured surfaces spanning creatures, objects, architecture and transport. Each surface is provided as a 50 k-point RGB point cloud, a 8K-point RGB point cloud, a 512 × 512 RGB rendering, and five RGB-D multiviews. These multiple representations enable point-for-point evaluation across several modalities.

GLUE3D consists of three Q&A task types: binary question answering, multiple-choice question answering, and open-ended captioning. This diverse set of tasks enables a more robust and comprehensive assessment of multimodal understanding in 3D-LLMs.


Installation

To evaluate your question-answering model on GLUE3D, we offer a PyPI package that can be easily installed with the command:

pip install glue3d

You can install glue3d from source if you want the latest changes in the library or are interested in contributing. However, the latest version may not be stable. Feel free to open an issue if you encounter an error.

git clone https://github.com/giorgio-mariani/GLUE3D.git
cd GLUE3D

pip install -e .

Answer generation

To evaluate your model, first you need to generate your 3D-LLM's answers for the desired GLUE3D task. You can do so in two main ways:

  1. Using the dataset loader (load_GLUE3D_benchmark) with your own model and code.
  2. Using the built-in AnswerGenerator interface with generate_GLUE3D_answers. This option is to be preferred if your model follows huggingface causal generation procedure (e.g., LlavaLlamaForCausalLM).
Option 1: Using load_GLUE3D_benchmark

Using load_GLUE3D_benchmark

The GLUE3D benchmark data can be (down)loaded using:

import pandas as pd
from glue3d.data import load_GLUE3D_benchmark

dataset = load_GLUE3D_benchmark(
    dataset_name="GLUE3D-points-8k", # or "GLUE3D-images", "GLUE3D-multiview", "GLUE3D-points"
    qa_task="binary_task",           # or "multiplechoice_task", "captioning_task"
    cache_dir=None,                  # Optional; defaults to './cache' or $GLUE3D_CACHE_DIR
)

This procedure loads to memory and prepare the necessary GLUE3D data for the specified Q&A task and data-type. It also automatically downloads all necessary data to disk if this is not yet stored. The available tasks are binary_task, multiplechoice_task, captioning_task. Note that the loader uses a local cache directory. You can customize it via the GLUE3D_CACHE_DIR environment variable.

Once the GLUE3D data is loaded, you can iterate through the dataset to generate answers for each question in the Q&A task:

your_model = ...  # Load your 3D-LLM

model_answers = []
for x in dataset:
    oid = x["object_id"]
    qid = x["question_id"]
    q = x["question"]
    pc = x["data"]  # e.g., (8192 x 6) np.ndarray for "GLUE3D-points-8K"

    answer = your_model.answer_question(pc, q)
    model_answers.append({
        "OBJECT_ID": oid,
        "QUESTION_ID": qid,
        "MODEL_ANSWER": answer,
    })

# Save results
pd.DataFrame.from_records(model_answers).to_csv("qa.csv", index=False)

[!IMPORTANT] Ensure your answers follow the expected format for each task.

  • For the binary_task, the model answer must be a boolean object (either True or False).
  • For the multiplechoice_task, the model answer must be one of A, B, C, D.
  • For the captioning_task the model answer must be a string.
Option 2: Using the `AnswerGenerator` Interface

Using the AnswerGenerator Interface

If your 3D-LLM inherits the GeneratorMixin class (e.g., LlavaLlamaForCausalLM), then it is possible to use our *HFAnswerGenerator abstract classes to simplify the generation process. The only requirement is to implement the prepare_inputs function, which takes in input the point cloud (or image) and the question and returns the keyword inputs for the GeneratorMixin.generate() method:

import numpy as np
from typing import override
from glue3d import generate_GLUE3D_answers
from glue3d.models.hf import (
    BinaryHFAnswerGenerator,
    MultichoiceHFAnswerGenerator,
    CaptioningHFGenerator
)

# Example custom AnswerGenerator for the binary task
class YourAnswerGenerator(BinaryHFAnswerGenerator): # <- Swap with MultichoiceHFAnswerGenerator
    def __init__(self, your_model, tokenizer):      #   or CaptioningHFGenerator for other tasks.
        super().__init__(your_model, tokenizer)

    @override
    def prepare_inputs(self, data: np.ndarray, text: str) -> dict:
        ... # Preprocess data (e.g., tokenize text, move tensors to device, apply chat templates)
        return {
            "input_ids": ...,
            "points": ...,
            "do_sample": ...,
            "stopping_criteria": ...,
        }

Once you have your custom implementation, generation can be simply done by calling generate_GLUE3D_answers on your target dataset-type and Q&A task:

your_model = ...
answer_gen = YourAnswerGenerator(your_model)

qa_answers = generate_GLUE3D_answers(
    qa_task="binary_task",
    dataset_type="GLUE3D-points-8K",
    answer_generator=answer_gen,
)

# `qa_answers` is returned as a pandas DataFrame
qa_answers.to_csv("qa.csv", index=False)

Q&A evaluation

As result of the answer generation step, you should have a .CSV file containing the question-answer pairs for a given task. The file (let us call it binary-qa.csv) should have a structure similar to

OBJECT_ID, QUESTION_ID, VALUE
dc5c798, 0fbac6, True
dc5c798, 556cc4, False
...

It is then possible to evaluate the answers produced by your model using the glue3d evaluate CLI command:

glue3d evaluate --input-file binary-qa.csv --output-file out.csv --task binary_task

Or equivalently, using Python

from glu3d.evaluate_answers import evaluate_GLUE3D_answers

out = evaluate_GLUE3D_answers("binary_task", "binary-qa.csv")
out.to_csv("out.csv")

For the binary and multiple choice tasks, the output is a dataframe which indicates extact match between the question answer and the model provided one. For the captioning task, results scores for BLEU, METEOR, ROUGE-L, S-BERT, and SimCSE are provided. All scores are scaled to range between 0-100.

[!NOTE] For the captioning task it is also possible to change the evaluator to use qwen3-30B-A3B as a judge. To do so use the command:

glue3d evaluate --input-file captions.csv --output-file out.csv --task captioning_task --evaluator qwen_3_30B_A3B

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glue3d-0.2.3.tar.gz (410.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glue3d-0.2.3-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file glue3d-0.2.3.tar.gz.

File metadata

  • Download URL: glue3d-0.2.3.tar.gz
  • Upload date:
  • Size: 410.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for glue3d-0.2.3.tar.gz
Algorithm Hash digest
SHA256 57d0ea442c43048f40350594ddc6b2c350e147651744b687c646a5692dcf3759
MD5 05dfd56e233dfe2e0803dacb6a5d1999
BLAKE2b-256 5cb8527a30749829be5f8e2edfebbd83241325600db3898872e8b872dae38962

See more details on using hashes here.

File details

Details for the file glue3d-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: glue3d-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for glue3d-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8701fc7acf873be8ea9d9f048f45472f7dfd9d112e46b9dbd517239870b0ab40
MD5 93c3f945e21c598289ea5f6c0bcdbec0
BLAKE2b-256 92e1c7f94fae36cef0ee3d5fee41847ae933111b36ec6dd9ee644c6f1c94c0ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page