Last released Apr 18, 2026
A lightweight framework for benchmarking multimodal AI agents with parallel execution, prompt variation, and automated evaluation.
Supported by