6 projects
eleuther-elk
Keeping language models honest by directly eliciting knowledge encoded in their activations
eai-delphi
Automated Interpretability
bergson
Tracing the memory of neural nets with data attribution
concept-erasure
Erasing concepts from neural representations with provable guarantees
tokengrams
Efficiently computing & storing token n-grams from large corpora
eai-sparsify
Sparsify transformers with SAEs and transcoders