Skip to main content

Open-source SAE visualizer, based on Anthropic's published visualizer. Forked / Detached from sae_vis.

Project description

SAEDashboard

SAEDashboard is a tool for visualizing and analyzing Sparse Autoencoders (SAEs) in neural networks. This repository is an adaptation and extension of Callum McDougal's SAEVis, providing enhanced functionality for feature visualization and analysis as well as feature dashboard creation at scale.

Overview

This codebase was originally designed to replicate Anthropic's sparse autoencoder visualizations, which you can see here. SAEDashboard primarily provides visualizations of features, including their activations, logits, and correlations--similar to what is shown in the Anthropic link.

Features

  • Customizable dashboards with various plots and data representations for SAE features
  • Support for any SAE in the SAELens library
  • Neuronpedia integration for hosting and comprehensive neuron analysis (note: this requires a Neuronpedia account and is currently only used internally)
  • Ability to handle large datasets and models efficiently

Installation

Install SAEDashboard using pip:

pip install sae-dashboard

Quick Start

Here's a basic example of how to use SAEDashboard with SaeVisRunner:

from sae_lens import SAE
from transformer_lens import HookedTransformer
from sae_dashboard.sae_vis_data import SaeVisConfig
from sae_dashboard.sae_vis_runner import SaeVisRunner

# Load model and SAE
model = HookedTransformer.from_pretrained("gpt2-small", device="cuda", dtype="bfloat16")
sae, _, _ = SAE.from_pretrained(
    release="gpt2-small-res-jb",
    sae_id="blocks.6.hook_resid_pre",
    device="cuda"
)
sae.fold_W_dec_norm()

# Configure visualization
config = SaeVisConfig(
    hook_point=sae.cfg.hook_name,
    features=list(range(256)),
    minibatch_size_features=64,
    minibatch_size_tokens=256,
    device="cuda",
    dtype="bfloat16"
)

# Generate data
data = SaeVisRunner(config).run(encoder=sae, model=model, tokens=your_token_dataset)

# Save feature-centric visualization
from sae_dashboard.data_writing_fns import save_feature_centric_vis
save_feature_centric_vis(sae_vis_data=data, filename="feature_dashboard.html")

For a more detailed tutorial, check out our demo notebook.

Advanced Usage: Neuronpedia Runner

For internal use or advanced analysis, SAEDashboard provides a Neuronpedia runner that generates data compatible with Neuronpedia. Here's a basic example:

from sae_dashboard.neuronpedia.neuronpedia_runner_config import NeuronpediaRunnerConfig
from sae_dashboard.neuronpedia.neuronpedia_runner import NeuronpediaRunner

config = NeuronpediaRunnerConfig(
    sae_set="your_sae_set",
    sae_path="path/to/sae",
    np_set_name="your_neuronpedia_set_name",
    huggingface_dataset_path="dataset/path",
    n_prompts_total=1000,
    n_features_at_a_time=64
)

runner = NeuronpediaRunner(config)
runner.run()

For more options and detailed configuration, refer to the NeuronpediaRunnerConfig class in the code.

Cross-Layer Transcoder (CLT) Support

SAEDashboard now supports visualization of Cross-Layer Transcoders (CLTs), which are a variant of SAEs that process activations across transformer layers. To use CLT visualization:

Required Files

When using a CLT model, you'll need these files in your CLT model directory:

  1. Model weights: A .safetensors or .pt file containing the CLT weights
  2. Configuration: A cfg.json file with the CLT configuration, including:
    • num_features: Number of features in the CLT
    • num_layers: Number of transformer layers
    • d_model: Model dimension
    • activation_fn: Activation function (e.g., "jumprelu", "relu")
    • normalization_method: How inputs are normalized (e.g., "mean_std", "none")
    • tl_input_template: TransformerLens hook template (e.g., "blocks.{}.ln2.hook_normalized"). Note that this will usually differ from the hook name in the model's cfg.json, which is based on NNsight/transformers. You will need to find the corresponding TransformerLens hook name.
  3. Normalization statistics (if normalization_method is "mean_std"): A norm_stats.json file containing the mean and standard deviation for each layer's inputs, generated from the dataset when activations were generated (or afterwards). The file should have this structure:
    {
      "0": {
        "inputs": {
          "mean": [0.1, -0.2, ...],  // Array of d_model values
          "std": [1.0, 0.9, ...]      // Array of d_model values
        }
      },
      "1": {
        "inputs": {
          "mean": [...],
          "std": [...]
        }
      },
      // ... entries for each layer
    }
    

Example Usage

from sae_dashboard.neuronpedia.neuronpedia_runner_config import NeuronpediaRunnerConfig
from sae_dashboard.neuronpedia.neuronpedia_runner import NeuronpediaRunner

config = NeuronpediaRunnerConfig(
    sae_set="your_clt_set",
    sae_path="/path/to/clt/model/directory",  # Directory containing the files above
    model_id="gpt2",  # Base model the CLT was trained on
    outputs_dir="clt_outputs",
    huggingface_dataset_path="your/dataset",
    use_clt=True,  # Enable CLT mode
    clt_layer_idx=5,  # Which layer to visualize (0-indexed)
    clt_weights_filename="model.safetensors",  # Optional: specify exact weights file
    n_prompts_total=1000,
    n_features_at_a_time=64
)

runner = NeuronpediaRunner(config)
runner.run()

Notes on CLT Support

  • CLTs must be loaded from local files (HuggingFace Hub loading not yet supported)
  • The --use-clt flag is mutually exclusive with --use-transcoder and --use-skip-transcoder
  • JumpReLU activation functions with learned thresholds are supported
  • The visualization will show features for the specified layer only

Configuration Options

SAEDashboard offers a wide range of configuration options for both SaeVisRunner and NeuronpediaRunner. Key options include:

  • hook_point: The layer to analyze in the model
  • features: List of feature indices to visualize
  • minibatch_size_features: Number of features to process in each batch
  • minibatch_size_tokens: Number of tokens to process in each forward pass
  • device: Computation device (e.g., "cuda", "cpu")
  • dtype: Data type for computations
  • sparsity_threshold: Threshold for feature sparsity (Neuronpedia runner)
  • n_prompts_total: Total number of prompts to analyze
  • use_wandb: Enable logging with Weights & Biases

Refer to SaeVisConfig and NeuronpediaRunnerConfig for full lists of options.

Contributing

This project uses Poetry for dependency management. After cloning the repo, install dependencies with poetry lock && poetry install.

We welcome contributions to SAEDashboard! Please follow these steps:

  1. Fork the repository
  2. Create a new branch for your feature
  3. Implement your changes
  4. Run tests and checks:
    • Use make format to format your code
    • Use make check-ci to run all checks and tests
  5. Submit a pull request

Ensure your code passes all checks, including:

  • Black and Flake8 for formatting and linting
  • Pyright for type-checking
  • Pytest for tests

Citing This Work

To cite SAEDashboard in your research, please use the following BibTeX entry:

@misc{sae_dashboard,
    title  = {{SAE Dashboard}},
    author = {Decode Research},
    howpublished = {\url{https://github.com/jbloomAus/sae-dashboard}},
    year   = {2024}
}

License

SAE Dashboard is licensed under the MIT License. See the LICENSE file for details.

Acknowledgment and Citation

This project is based on the work by Callum McDougall. If you use SAEDashboard in your research, please cite the original SAEVis project as well:

@misc{sae_vis,
  title = {{SAE Visualizer}},
  author = {Callum McDougall},
  howpublished = {\url{https://github.com/callummcdougall/sae_vis}},
  year = {2024}
}

Contact

For questions or support, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sae_dashboard-0.8.0.tar.gz (107.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sae_dashboard-0.8.0-py3-none-any.whl (127.1 kB view details)

Uploaded Python 3

File details

Details for the file sae_dashboard-0.8.0.tar.gz.

File metadata

  • Download URL: sae_dashboard-0.8.0.tar.gz
  • Upload date:
  • Size: 107.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sae_dashboard-0.8.0.tar.gz
Algorithm Hash digest
SHA256 a80b968e4ee1378fbbaf9369d33391839f586dcb1b72132ae5435e309d6a1137
MD5 6e93f9bb5da4aa99caf4a000b8bfa027
BLAKE2b-256 a73c0f7c54ce0b5e075528dc5f4adeb1d1c5bb74d08890189177bc0fefaaf061

See more details on using hashes here.

Provenance

The following attestation bundles were made for sae_dashboard-0.8.0.tar.gz:

Publisher: ci.yaml on jbloomAus/SAEDashboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sae_dashboard-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: sae_dashboard-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 127.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sae_dashboard-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9eb34e13a761d870d17a818338a944c3cf6092897f330ab5e49a11c2a503c635
MD5 ad57da4403834ec50575e9c21949b1a0
BLAKE2b-256 8fff69e61ad8d7c66697d865c484fbaff4250572387d68a39661c84f628e0c2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sae_dashboard-0.8.0-py3-none-any.whl:

Publisher: ci.yaml on jbloomAus/SAEDashboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page