Last released Feb 17, 2026
A lightweight tool for generating annotated eval datasets and running LLM-as-judge evaluations
Supported by