Skip to main content

Transformer token flow visualizer

Project description

Token-Trace

A tool and UI to construct prompt-centric views of SAE feature attributions.

Main functionality:

  • We use this tool to identify which SAE features have the most 'attribution' towards decreasing the model loss.
  • In combination with Neuronpedia, we can identify what each SAE feature represents; this then gives us a rough idea of what computation the model is performing.

This tool is a first step towards discovering information flow between the features / layers of a transformer

Quickstart

Installation

pip install token-trace

Example Usage

from token_trace import compute_node_attribution

text = "When John and Mary went to the shops, John gave the bag to Mary"

df: pd.DataFrame = compute_node_attribution(
    model_name = "gpt2",
    text
)

Each row of df describes one node corresponding to an SAE feature or error term.

Visualizing SAE attribution statistics in frontend.

We use Streamlit to create a UI. Start the app as follows:

streamlit run app/token_trace_app.py

Methodology

Under the hood, we use attribution patching to compute indirect effect of the loss with respect to SAE features. The method is adapted heavily from Sparse Feature Circuits.

Development

We use PDM to manage dependencies. Set up a development environment as follows:

pdm install # creates a .venv
source .venv/bin/activate

Once in the virtual environment, make sure to also install the pre-commit hooks

pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_trace-0.3.2.tar.gz (20.3 kB view hashes)

Uploaded Source

Built Distribution

token_trace-0.3.2-py3-none-any.whl (22.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page