Profile of yongzhe2160

siggate

Last released Jun 14, 2026

The vendor-neutral significance gate for AI evals. Most eval tools show a number moving; siggate tells you — in CI — whether the move is statistically real or within noise.

Statistical validation for LLM/ML eval comparisons: paired delta CIs, multiple-testing correction, deflated significance, power analysis, and noise diagnostics. Most reported eval deltas are noise — this gates them.

calibstats

Last released Jun 13, 2026

Calibration metrics with bootstrap confidence intervals — because a bare ECE is not enough.

agentrel

Last released Jun 13, 2026

Reliability and reproducibility statistics for stochastic agent / tool-use evals.

Yongzhe Wang

4 projects

siggate

deltagate

calibstats

agentrel