contradish
Last released
Close the repair loop: contradish improve runs the benchmark, rewrites your system prompt, re-runs, and returns the diff in CAI Strain in one command. Equivalence is audited per case so the headline number reflects model failure, not benchmark framing. CAI-Bench v2: 20 domains, 2,160 strain tests, 8 adversarial techniques.