Last released Mar 15, 2026
A comprehensive framework for evaluating Large Language Models with built-in support for bias, toxicity, relevancy metrics, custom evaluations, conversational test cases, release tracking, and token counting
Supported by