A tool for visualizing and exploring feature activations in neural language models.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Tiny Activation Dashboard

A tiny easily hackable implementation of a feature dashboard.

Overview

This repository provides a powerful and intuitive tool for visualizing and exploring feature activations in neural language models, with a focus on making complex model interpretability more accessible.

Motivation

There are some other good feature activations dashboard tools out there, but I found them very hard to hack on when I wanted to add support for Crosscoders. This implementation is not as complete as https://github.com/jbloomAus/SAEDashboard or even the simplier https://github.com/callummcdougall/sae_vis but in my honest non-biased-at-all opinion, this implementation seems easier to hack on?

Key Features

Both the offline and online dashboards include:

Token-level activation highlighting
Hover tooltips showing token details
Responsive design
Save HTML reports

1. Offline Feature Exploration

Analyze pre-computed feature activations
Visualize max activation examples for specific features
Expandable text views
Generate interactive HTML reports

from src.feature_centric_dashboards import OfflineFeatureCentricDashboard

# Create dashboard with pre-computed activations
max_activation_examples: dict[int, list[tuple[float, list[str], list[float]]]] = ...
# max_activation_examples is a dictionary where the keys are feature indices and the values are lists of tuples. Each tuple contains a float (max activation value), a list of strings (the text of the example), and a list of floats (the activation values for each token in the example).

dashboard = OfflineFeatureCentricDashboard(max_activation_examples, tokenizer)
dashboard.display()

# Export to HTML for sharing
feature_to_export = 0
dashboard.export_to_html("feature_analysis.html", feature_to_export)

2. Online Feature Exploration

The online dashboard allows you to analyze the activations of a model in real-time. This is useful for quickly exploring the activations of a model on your custom prompts.

The online dashboard supports chat_template formatting: just include <eot> in your input text to separate your chat turns. E.g:

What is the capital of France?<eot>The capital of France is Paris.<eot>Good bing

will be interpreted as:

[
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "Good bing"}
]

and formated using the tokenizer's chat template.

Two approaches to build your real-time feature analysis dashboard:

A. Class-based Method

Create a class that implements the AbstractOnlineFeatureCentricDashboard class and implements the get_feature_activation function. This function should take a string and a tuple of feature indices and return a tensor of activation values of shape (seq_len, num_features) containing the activations of the specified features for the input text.

class DummyOnlineFeatureCentricDashboard(AbstractOnlineFeatureCentricDashboard):
    def get_feature_activation(self, text: str, feature_indices: tuple[int, ...]) -> th.Tensor:
        # Custom activation computation logic
        tok_len = len(self.tokenizer.encode(text))
        activations = th.randn((tok_len, len(feature_indices))).exp()
        return activations
    
    # Optional: override generate_model_response to change the model's response generation

online_dashboards = DummyOnlineFeatureCentricDashboard(tokenizer, model)
online_dashboards.display()

B. Function-based Method

If you hate classes for some reason, you can also use the function-based method:

def get_feature_activation(text, feature_indices):
    return th.randn((len(tokenizer.encode(text)), len(feature_indices))).exp()

online_dashboards = OnlineFeatureCentricDashboard(
    get_feature_activation, 
    tokenizer,
    generate_model_response = None,  # Optional: override the model's response generation function
    model = None,  # Optional: pass in a model to use the model's response generation function
    call_with_self = False,  # Whether to call the functions with self as the first argument, defaults to Falses
)
online_dashboards.display()

Specialized Implementations

The package includes several specialized dashboard implementations in dashboard_implementations.py:

CrosscoderOnlineFeatureDashboard

For analyzing features using a crosscoder model that combines base and instruct model activations:

from tiny_dashboard.dashboard_implementations import CrosscoderOnlineFeatureDashboard

base_model, instruct_model, crosscoder = ...
collect_layer = 12

dashboard = CrosscoderOnlineFeatureDashboard(
    base_model=base_model,
    instruct_model=instruct_model,
    crosscoder=crosscoder,
    collect_layer=collect_layer,
    crosscoder_device="cuda"  # optional, use it if the crosscoder is on a different device than the base and instruct models
)
dashboard.display()

Additional specialized implementations can be found in the dashboard_implementations.py file. Feel free to contribute new implementations!

Example Workflow

Load a pre-trained language model
Compute feature activations
Create a dashboard
Explore and analyze feature behaviors

Repository Structure

The repository is organized as follows:

demo.ipynb: A Jupyter notebook containing minimal examples demonstrating how to use both offline and online dashboards
src/: Main package directory
- feature_centric_dashboards.py: Core implementation of the dashboard classes (OfflineFeatureCentricDashboard, OnlineFeatureCentricDashboard, and AbstractOnlineFeatureCentricDashboard)
- dashboard_implementations.py: Collection of specialized dashboard implementations (e.g., CrosscoderOnlineFeatureDashboard)
- html_utils.py: Utility functions for generating HTML elements using templates
- utils.py: General utility functions for text processing and HTML sanitization
- templates/: HTML, CSS, and JavaScript templates
  - HTML templates for different components (base layout, feature sections, examples, etc.)
  - styles.css: CSS styling for the dashboard
  - listeners.js: JavaScript for interactive features (tooltips, expandable text)

Installation

pip install git+https://github.com/butanium/tiny-activation-dashboard.git

Contributing

Contributions are welcome! Please feel free to improve the minimal design and add some usage examples.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.7.3

Feb 11, 2025

0.7.2

Jan 29, 2025

0.7.1

Jan 29, 2025

0.7.0

Jan 29, 2025

0.6.1

Jan 29, 2025

0.6.0

Jan 28, 2025

0.5.5

Jan 28, 2025

0.5.4

Jan 23, 2025

0.5.2

Jan 22, 2025

0.5.1

Jan 22, 2025

0.5.0

Jan 22, 2025

0.4.0

Jan 22, 2025

0.3.0

Jan 22, 2025

0.2.1

Nov 25, 2024

0.2.0

Nov 24, 2024

This version

0.1.1

Nov 23, 2024

0.1

Nov 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_dashboard-0.1.1.tar.gz (17.3 kB view details)

Uploaded Nov 23, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tiny_dashboard-0.1.1-py3-none-any.whl (16.4 kB view details)

Uploaded Nov 23, 2024 Python 3

File details

Details for the file tiny_dashboard-0.1.1.tar.gz.

File metadata

Download URL: tiny_dashboard-0.1.1.tar.gz
Upload date: Nov 23, 2024
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for tiny_dashboard-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`91c278876d39a3c8ac4313a8c98fa16f9af115fb74ed1c6e37b7925e246e4109`
MD5	`be48f54b4f9c4775f48c9ad768eefdb5`
BLAKE2b-256	`b99500868e54bb191bc85b0b6c0db279732bca42556a403f97b486c96cf2f503`

See more details on using hashes here.

File details

Details for the file tiny_dashboard-0.1.1-py3-none-any.whl.

File metadata

Download URL: tiny_dashboard-0.1.1-py3-none-any.whl
Upload date: Nov 23, 2024
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for tiny_dashboard-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d4df9008ac638bfa2dfefcfbd5c3ea118fff06f44ebd2d1a13d5aecfbe85920`
MD5	`5965343fbd593cda5e021467125ce823`
BLAKE2b-256	`e3d74fac9ddb0203f366d3e8f48e210aea14be3468e73c26e34e845f1731b26f`

See more details on using hashes here.

tiny-dashboard 0.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Tiny Activation Dashboard

Overview

Motivation

Key Features

1. Offline Feature Exploration

2. Online Feature Exploration

A. Class-based Method

B. Function-based Method

Specialized Implementations

CrosscoderOnlineFeatureDashboard

Example Workflow

Repository Structure

Installation

Contributing

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes