4 projects
siggate
The vendor-neutral significance gate for AI evals. Most eval tools show a number moving; siggate tells you — in CI — whether the move is statistically real or within noise.
deltagate
Statistical validation for LLM/ML eval comparisons: paired delta CIs, multiple-testing correction, deflated significance, power analysis, and noise diagnostics. Most reported eval deltas are noise — this gates them.
calibstats
Calibration metrics with bootstrap confidence intervals — because a bare ECE is not enough.
agentrel
Reliability and reproducibility statistics for stochastic agent / tool-use evals.