Last released Apr 21, 2025
A framework for comparing responses from different large language models.
Supported by