Skip to main content

In-depth visualizations for SAE features

Project description

feature-lens

Github Actions Ruff pdm-managed Checked with pyright

A research engineering toolkit for understanding how SAE features relate to each other, and to upstream / downstream components.

For projects supported by feature-lens, see projects

Quickstart

git clone https://github.com/dtch1997/feature-lens
cd feature-lens
pip install -e .

Development

Refer to Setup for how to set up development environment.

Implementation Details

Techniques for finding relevant feature associations:

Note: "Total" attribution patching calculates the full effect of one feature on another via gradient backpropagation. "Direct" attribution patching estimates only the direct effect, which can be calculated analytically using matrix multiplication.

Tools which will be implemented.

  • SAE features. We will use SAEs from SAE-Lens, which are annotated and have Neuronpedia visualizations.
  • Feature dashboards. By visualizing the "functional connectome" of SAE features, we may obtain novel insight about what an SAE feature is doing. For each target SAE feature, we can create a dashboard of all relevant upstream / downstream features.

Ideas under consideration.

  • Feature clustering. Features with similar upstream and downstream features could be hypothesized to be performing a similar role. Clustering features based on their connections may reveal novel insight about the general types of "functional role" played by SAE features.
  • Linear direction features. Steering vectors are like an SAE feature because they are added to the residual activations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_lens-0.2.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

feature_lens-0.2.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file feature_lens-0.2.0.tar.gz.

File metadata

  • Download URL: feature_lens-0.2.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for feature_lens-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8d0f804eb930da50e149caf9fbf9b2828537814e1772bdcb28dd32b54953d507
MD5 247cce77320603c020ef40a2c708a21c
BLAKE2b-256 03ea51a74574c30c683760a40b97eadeba624279ef4c7e7e3b361483bf631c82

See more details on using hashes here.

File details

Details for the file feature_lens-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: feature_lens-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for feature_lens-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ca6981d0d5f94b091b9d9b1f02d2255bf51bccac0c946b0479a306b1c13ce19
MD5 061b0bd108293c904499501b5829daec
BLAKE2b-256 5bde20143e023bf94e909e05a624a05f4eb45c7d80a9df7aa4f97ae8b17ba2bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page