Last released Mar 19, 2026
Benchmark utilities and environments for evaluating multimodal LLMs' proactiveness.
Supported by