Profile of michelejoseph

Some features may not work without JavaScript. Please try enabling it if you encounter problems.

1 project

contradish

Last released May 22, 2026

Tells you what's actually wrong with your LLM, not just that something is. Findings layer mines every run for the one specific surprise — root-cause clusters, rigidity vs drift, the cases where your model gave both the right and the wrong answer to the same question. Plus: Judgment Strain (two-sided), the one-command repair loop, and per-case equivalence audit. CAI-Bench v2: 20 domains, 2,160 strain tests.