In-depth visualizations for SAE features
Project description
feature-lens
A research engineering toolkit for understanding how SAE features relate to each other, and to upstream / downstream components.
For projects supported by feature-lens
, see projects
Quickstart
git clone https://github.com/dtch1997/feature-lens
cd feature-lens
pip install -e .
Development
Refer to Setup for how to set up development environment.
Implementation Details
Techniques for finding relevant feature associations:
- Activation patching (employed in Causal Graphs)
- (Total) attribution patching (employed in Sparse Feature Circuits)
- Direct attribution patching (employed in MLP Transcoders, Attention-out SAEs)
Note: "Total" attribution patching calculates the full effect of one feature on another via gradient backpropagation. "Direct" attribution patching estimates only the direct effect, which can be calculated analytically using matrix multiplication.
Tools which will be implemented.
- SAE features. We will use SAEs from SAE-Lens, which are annotated and have Neuronpedia visualizations.
- Feature dashboards. By visualizing the "functional connectome" of SAE features, we may obtain novel insight about what an SAE feature is doing. For each target SAE feature, we can create a dashboard of all relevant upstream / downstream features.
Ideas under consideration.
- Feature clustering. Features with similar upstream and downstream features could be hypothesized to be performing a similar role. Clustering features based on their connections may reveal novel insight about the general types of "functional role" played by SAE features.
- Linear direction features. Steering vectors are like an SAE feature because they are added to the residual activations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file feature_lens-0.1.0.tar.gz
.
File metadata
- Download URL: feature_lens-0.1.0.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10655354c39ad05cca941fe6eb57d50317597eabb39625861c755314657d0cfe |
|
MD5 | 64c3f82c8894075c741d59c724684255 |
|
BLAKE2b-256 | bdf8d0fe185f2071310f285bc5acd8d500f08475fcbb960726c39e63908911e6 |
File details
Details for the file feature_lens-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: feature_lens-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90720a150d5fbf48d002e3069757c69376bb393d692faacc69a6025137fe68ec |
|
MD5 | d9d6d437d878b36a3bb12cbdf1a88f94 |
|
BLAKE2b-256 | fa845b543e510118fdb4a92158aab5ee70c9e81a75d2d4d52853d8a12f89c2e1 |