Skip to main content

In-depth visualizations for SAE features

Project description

feature-lens

Github Actions Ruff pdm-managed Checked with pyright

A research engineering toolkit for understanding how SAE features relate to each other, and to upstream / downstream components.

For projects supported by feature-lens, see projects

Quickstart

git clone https://github.com/dtch1997/feature-lens
cd feature-lens
pip install -e .

Development

Refer to Setup for how to set up development environment.

Implementation Details

Techniques for finding relevant feature associations:

Note: "Total" attribution patching calculates the full effect of one feature on another via gradient backpropagation. "Direct" attribution patching estimates only the direct effect, which can be calculated analytically using matrix multiplication.

Tools which will be implemented.

  • SAE features. We will use SAEs from SAE-Lens, which are annotated and have Neuronpedia visualizations.
  • Feature dashboards. By visualizing the "functional connectome" of SAE features, we may obtain novel insight about what an SAE feature is doing. For each target SAE feature, we can create a dashboard of all relevant upstream / downstream features.

Ideas under consideration.

  • Feature clustering. Features with similar upstream and downstream features could be hypothesized to be performing a similar role. Clustering features based on their connections may reveal novel insight about the general types of "functional role" played by SAE features.
  • Linear direction features. Steering vectors are like an SAE feature because they are added to the residual activations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_lens-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

feature_lens-0.1.0-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file feature_lens-0.1.0.tar.gz.

File metadata

  • Download URL: feature_lens-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for feature_lens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 10655354c39ad05cca941fe6eb57d50317597eabb39625861c755314657d0cfe
MD5 64c3f82c8894075c741d59c724684255
BLAKE2b-256 bdf8d0fe185f2071310f285bc5acd8d500f08475fcbb960726c39e63908911e6

See more details on using hashes here.

File details

Details for the file feature_lens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: feature_lens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for feature_lens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90720a150d5fbf48d002e3069757c69376bb393d692faacc69a6025137fe68ec
MD5 d9d6d437d878b36a3bb12cbdf1a88f94
BLAKE2b-256 fa845b543e510118fdb4a92158aab5ee70c9e81a75d2d4d52853d8a12f89c2e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page