Last released Apr 18, 2024
REFT: Representation Finetuning for Language Models
Last released Apr 8, 2024
Use Activation Intervention to Interpret Causal Mechanism of Model
Supported by