Last released May 7, 2026
LLM evaluation harness with custom metrics, LLM-as-judge, and regression tracking
Supported by