Training and Analyzing Sparse Autoencoders (SAEs)
Project description
SAE Lens
SAELens exists to help researchers:
- Train sparse autoencoders.
- Analyse sparse autoencoders / research mechanistic interpretability.
- Generate insights which make it easier to create safe and aligned AI systems.
Please refer to the documentation for information on how to:
- Download and Analyse pre-trained sparse autoencoders.
- Train your own sparse autoencoders.
- Generate feature dashboards with the SAE-Vis Library.
SAE Lens is the result of many contributors working collectively to improve humanities understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence.
This library is maintained by Joseph Bloom and David Chanin.
Tutorials
- Loading and Analysing Pre-Trained Sparse Autoencoders
- Understanding SAE Features with the Logit Lens
- Training a Sparse Autoencoder
Join the Slack!
Feel free to join the Open Source Mechanistic Interpretability Slack for support!
Citations and References
Research:
Reference Implementations:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sae_lens-3.11.0.tar.gz
(74.4 kB
view hashes)
Built Distribution
sae_lens-3.11.0-py3-none-any.whl
(84.4 kB
view hashes)
Close
Hashes for sae_lens-3.11.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb8372b5bb27eeae5047e2786e4a5fcc087dfbd73cd8b43fd4e058ec74b125ea |
|
MD5 | f683b45a30370e4dd71ddfedee7a5199 |
|
BLAKE2b-256 | c294ac303d2a60506dce6634909e25690c6094efab38e3500df8be3815c7e11f |