Last released Feb 26, 2026
A strict experimental harness for reproducible, statistically valid model evaluation.
Supported by