Last released Apr 26, 2026
SIREN: a lightweight, plug-and-play guard model for LLM harmfulness detection from internal representations.
Supported by