contradish
Last released
Tells you what's actually wrong with your LLM, not just that something is. Findings layer mines every run for the one specific surprise — root-cause clusters, rigidity vs drift, the cases where your model gave both the right and the wrong answer to the same question. Plus: Judgment Strain (two-sided), the one-command repair loop, and per-case equivalence audit. CAI-Bench v2: 20 domains, 2,160 strain tests.