Training and Analyzing Sparse Autoencoders (SAEs)
Project description
SAE Lens
SAELens exists to help researchers:
- Train sparse autoencoders.
- Analyse sparse autoencoders / research mechanistic interpretability.
- Generate insights which make it easier to create safe and aligned AI systems.
Please refer to the documentation for information on how to:
- Download and Analyse pre-trained sparse autoencoders.
- Train your own sparse autoencoders.
- Generate feature dashboards with the SAE-Vis Library.
SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence.
This library is maintained by Joseph Bloom and David Chanin.
Loading Pre-trained SAEs.
Pre-trained SAEs for various models can be imported via SAE Lens. See this page in the readme for a list of all SAEs.
Tutorials
- SAE Lens + Neuronpedia
- Loading and Analysing Pre-Trained Sparse Autoencoders
- Understanding SAE Features with the Logit Lens
- Training a Sparse Autoencoder
Join the Slack!
Feel free to join the Open Source Mechanistic Interpretability Slack for support!
Citation
Please cite the package as follows:
@misc{bloom2024saetrainingcodebase,
title = {SAELens},
author = {Joseph Bloom and David Chanin},
year = {2024},
howpublished = {\url{https://github.com/jbloomAus/SAELens}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sae_lens-3.20.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b45d37957fd313605342500e89477c041d7c1490e3ec3326c3fef2f5e5421c38 |
|
MD5 | 7f2656637100d0f264e1a7abe95d5f84 |
|
BLAKE2b-256 | 1ce925a49b9c2653a918b01408f0ed674cb08bfb7a59633567ceac71841aca0f |