Skip to main content

Analytics for LLMs

Project description

PyPI - Python Version PyPI Status Twitter

Inspectus

Inspectus is a versatile visualization tool for machine learning. It runs smoothly in Jupyter notebooks via an easy-to-use Python API.

Content

Installation

pip install inspectus

Attention Visualization

Inspectus provides visualization tools for attention mechanisms in deep learning models. It provides a set of comprehensive views, making it easier to understand how these models work.

Preview

Attention visualization

Click a token to select it and deselect others. Clicking again will select all again. To change the state of only one token, do shift+click

Components

Attention Matrix: Visualizes the attention scores between tokens, highlighting how each token focuses on others during processing.

Query Token Heatmap: Shows the sum of attention scores between each query and selected key tokens

Key Token Heatmap: Shows the sum of attention scores between each key and selected query tokens

Dimension Heatmap: Shows the sum of attention scores for each item in dimensions (Layers and Heads) normalized over the dimension.

Usage

Import the library

import inspectus

Simple usage

# attn: Attention map; a 2-4D tensor or attention maps from Huggingface transformers
inspectus.attention(attn, tokens)

For different query and key tokens

inspectus.attention(attns, query_tokens, key_tokens)

For detailed API documentation, please refer to the official documentation.

Tutorials

Huggingface model

from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig
import torch
import inspectus

# Initialize the tokenizer and model
context_length = 128
tokenizer = AutoTokenizer.from_pretrained("huggingface-course/code-search-net-tokenizer")

config = AutoConfig.from_pretrained(
    "gpt2",
    vocab_size=len(tokenizer),
    n_ctx=context_length,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

model = GPT2LMHeadModel(config)

# Tokenize the input text
text= 'The quick brown fox jumps over the lazy dog'
tokenized = tokenizer(
    text,
    return_tensors='pt',
    return_offsets_mapping=True
)
input_ids = tokenized['input_ids']

tokens = [text[s: e] for s, e in tokenized['offset_mapping'][0]]

with torch.no_grad():
    res = model(input_ids=input_ids.to(model.device), output_attentions=True)

# Visualize the attention maps using the Inspectus library
inspectus.attention(res['attentions'], tokens)

Check out the notebook here: Huggingface Tutorial Open In Colab

Custom attention map

import numpy as np
import inspectus

# 2D attention representing attention values between Query and Key tokens
attn = np.random.rand(3, 3)

# Visualize the attention values using the Inspectus library
# The first argument is the attention matrix
# The second argument is the list of query tokens
# The third argument is the list of key tokens
inspectus.attention(arr, ['a', 'b', 'c'], ['d', 'e', 'f'])

Check out the notebook here: Custom attention map tutorial

Open In Colab

Token Visualization

This tool is used to visualize some metric related to tokens. This supports multiple metrics and the metric used for the visualization can be selected using the dropdown. Along with metrics, any aditional information can be added to tokens.

Preview

Token Visualization

Usage

import inspectus

inspectus.tokens(['hello', 'world'], np.random.rand(2))

Distribution Plot

The distribution plot is a plot that shows the distribution of a series of data. At each step, the distribution of the data is calculated and maximum of 5 bands are drawn from 9 basis points. (0, 6.68, 15.87, 30.85, 50.00, 69.15, 84.13, 93.32, 100.00)

Preview

Distribution Plot visualization

Usage

import inspectus

inspectus.distribution({'x': [x for x in range(0, 100)]})

To focus on parts of the plot and zoom in, the minimap can be used. To select a single plot, use the legend on the top right.

For comprehensive usage guide please check the notebook here: Distribution Plot Tutorial

Open In Colab

Sample Use case

This plot can be used to identify the existence of outliers in the data. Following notebooks demonstrate how to use the distribution plot to identify outliers in the MNIST training loss.

MNIST

Open In Colab

Setting up for Development

Development Docs

Citing

If you use Inspectus for academic research, please cite the library using the following BibTeX entry.

@misc{inspectus,
 author = {Varuna Jayasiri, Lakshith Nishshanke},
 title = {inspectus: A visualization and analytics tool for large language models},
 year = {2024},
 url = {https://github.com/labmlai/inspectus},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspectus-0.2.0.tar.gz (116.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inspectus-0.2.0-py3-none-any.whl (117.2 kB view details)

Uploaded Python 3

File details

Details for the file inspectus-0.2.0.tar.gz.

File metadata

  • Download URL: inspectus-0.2.0.tar.gz
  • Upload date:
  • Size: 116.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for inspectus-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3df62655bffe1d19fa24c8adfe7584665488da2c79475c3b3694486293117a30
MD5 b768676d18b34f455ebf42b41e94b8bf
BLAKE2b-256 03baacd5fffcebe025af59871d4045a80341eb9a81ced709779fe5aa593af244

See more details on using hashes here.

File details

Details for the file inspectus-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: inspectus-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 117.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for inspectus-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 159a6cf1f8883c6059992865d8911879d21c87f47ccafc9fe1044752fd9de979
MD5 2f5778bbb92b3498b0e6731237503736
BLAKE2b-256 18d8347e16cd7743722aeb2b1dfa0edc2bd46bfe0f1188dd03b9d43e88270e44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page