Last released Jun 14, 2026
Decision-grade statistics for AI evals: paired comparisons, cluster-aware uncertainty, and power analysis on top of existing eval frameworks.
Supported by