Last released Mar 29, 2026
A toolkit for LLM-as-a-judge evaluation and arena benchmarks.
Supported by