Skip to main content

Open-source SAE visualizer, based on Anthropic's published visualizer.

Project description

Note - I'm still open to accepting PRs on this library, and am very happy for other people to build on it, but I won't be actively maintaining it going forwards since I'll be focusing on my job. The SAELens library will continue to have more development and iteration, and it uses a fork of this repo as well as containing a much larger suite of tools for working with SAEs, so depending on your use case you might find that library preferable!


This codebase was designed to replicate Anthropic's sparse autoencoder visualisations, which you can see here. The codebase provides 2 different views: a feature-centric view (which is like the one in the link, i.e. we look at one particular feature and see things like which tokens fire strongest on that feature) and a prompt-centric view (where we look at once particular prompt and see which features fire strongest on that prompt according to a variety of different metrics).

Install with pip install sae-vis. Link to PyPI page here.

See here for a demo Colab notebook (all the code to produce it is also in this repo, in the file sae_vis/demos/demo.py, as well as the files containing the created visualizations).

The library supports two types of visualizations:

  1. Feature-centric vis, where you look at a single feature and see e.g. which sequences in a large dataset this feature fires strongest on.
  1. Prompt-centric vis, where you input a custom prompt and see which features score highest on that prompt, according to a variety of possible metrics.

Citing this work

To cite this work, you can use this bibtex citation:

@misc{sae_vis,
    title  = {{SAE Visualizer}},
    author = {Callum McDougall},
    howpublished    = {\url{https://github.com/callummcdougall/sae_vis}},
    year   = {2024}
}

Contributing

This project is uses Poetry for dependency management. After cloning the repo, install dependencies with poetry install.

This project uses Ruff for formatting and linting, Pyright for type-checking, and Pytest for tests. If you submit a PR, make sure that your code passes all checks. You can run all checks with make check-all.

Version history (recording started at 0.2.9)

  • 0.2.9 - added table for pairwise feature correlations (not just encoder-B correlations)
  • 0.2.10 - fix some anomalous characters
  • 0.2.11 - update PyPI with longer description
  • 0.2.12 - fix height parameter of config, add videos to PyPI description
  • 0.2.13 - add to dependencies, and fix SAELens section
  • 0.2.14 - fix mistake in dependencies
  • 0.2.15 - refactor to support eventual scatterplot-based feature browser, fix ’ HTML
  • 0.2.16 - allow disabling buffer in feature generation, fix demo notebook, fix sae-lens compatibility & type checking
  • 0.2.17 - use main branch of sae-lens
  • 0.2.18 - remove circular dependency with sae-lens
  • 0.2.19 - formatting, error-checking
  • 0.2.20 - fix bugs, remove use of batch_size in config
  • 0.2.21 - formatting
  • 0.3.0 - major refactor which makes several improvements, removing complexity and adding new features:
    • OthelloGPT SAEs with linear probes (input / output space)
    • Attention output SAEs with max DFA visualized
    • Tokens labelled with their (batch, seq) indices as well as the change in correct-token probability on feature ablation, when hovered over
  • 0.3.1 - fix transformerlens dependency
  • 0.3.2 - adjust pyright type-checking
  • 0.3.3 - remove pyright type-checking
  • 0.3.4 - remove pyright type-checking (v2)
  • 0.3.5 - remove pyright type-checking (v3)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sae_vis-0.3.5.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sae_vis-0.3.5-py3-none-any.whl (10.6 MB view details)

Uploaded Python 3

File details

Details for the file sae_vis-0.3.5.tar.gz.

File metadata

  • Download URL: sae_vis-0.3.5.tar.gz
  • Upload date:
  • Size: 9.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Windows/11

File hashes

Hashes for sae_vis-0.3.5.tar.gz
Algorithm Hash digest
SHA256 cb23afb53d6c6a538d31c431ae3ecfa1cb670231b442cda89a70d170bed89ca6
MD5 35de58de63c8b5708e39cee4747fda6d
BLAKE2b-256 0b0fa6ed6e90b3ce25926e47977de07400566218029aa327bcb0a3323715fcc2

See more details on using hashes here.

File details

Details for the file sae_vis-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: sae_vis-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Windows/11

File hashes

Hashes for sae_vis-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fec827a438922c31d72dce1d7844f1dfaaf1f93d67bed9011dca8b6fa747730f
MD5 720aff654ed8f5877e37cfc33d163ffa
BLAKE2b-256 6570438543f346ce6d40713e9912b6b65b50ec1394a0f9c2470fb8610464083e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page