Last released Apr 5, 2024
None
Last released Jan 10, 2024
Erasing concepts from neural representations with provable guarantees
Last released Jul 20, 2023
Keeping language models honest by directly eliciting knowledge encoded in their activations
Supported by