Last released Jun 27, 2026
Interpretable, zero-training refusal-axis prompt detector (u_ref difference-of-means).
Supported by