Skip to main content

Add your description here

Project description

XLens

A Library for Mechanistic Interpretability of Generative Language Models using JAX. Inspired by TransformerLens.

Overview

XLens is designed for mechanistic interpretability of Transformer language models, leveraging the power and efficiency of JAX. The primary goal of mechanistic interpretability is to reverse engineer the algorithms that a model has learned during training, enabling researchers and practitioners to understand the inner workings of generative language models.

Features

⚠️ Please Note: Some features are currently in development and may not yet be fully functional. We appreciate your understanding as we work to improve and stabilize the library.

  • Support for Hooked Modules: Interact with and modify internal model components seamlessly.
  • Model Alignment with Hugging Face: Outputs from XLens are consistent with Hugging Face's implementation, making it easier to integrate and compare results.
  • Caching Mechanism: Cache any internal activation for further analysis or manipulation during model inference.
  • Full Type Annotations: Comprehensive type annotations with generics and jaxtyping for better code completion and type checking.
  • Intuitive API: Designed with ease of use in mind, facilitating quick experimentation and exploration.

Examples

Here are some basic examples to get you started with XLens.

Capturing Activations

from xlens import HookedTransformer
from transformers import AutoTokenizer

# Load a pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
model = HookedTransformer.from_pretrained("meta-llama/Llama-3.2-1B")

# Capture the activations of the model
inputs = tokenizer("Hello, world!", return_tensors="jax")
logits, cache = model.run_with_cache(**inputs, hook_names=["blocks.0.hook_attn_out"])
print(cache["blocks.0.hook_attn_out"].shape) # (1, 5, 2048)

Supported Models

XLens currently supports the following models:

Feel free to open an issue or pull request if you would like to see support for additional models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlens-0.1.0.tar.gz (107.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xlens-0.1.0-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file xlens-0.1.0.tar.gz.

File metadata

  • Download URL: xlens-0.1.0.tar.gz
  • Upload date:
  • Size: 107.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for xlens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 224d40323b2cf8047c946a29bbdf43e3adcb59f079d023234955cc8fd5f0d5b0
MD5 78ca6491c45cc684169c1a704a5e511f
BLAKE2b-256 d596fb48aaa97d710613f01769ab1e6187bb560277837acea2c2880d52c65854

See more details on using hashes here.

File details

Details for the file xlens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xlens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for xlens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77c716bd4f50d0c8c06a697939379a4364a12861232b54fef9c76f9133607d31
MD5 3cc086cb12f9222df967c5f904e495c1
BLAKE2b-256 5741bbe304ec4c2116a463836dee0749917cde6fce7b97304adaff1303ec1b0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page