Skip to main content

An implementation of transformers tailored for mechanistic interpretability.

Project description

TransformerLens

Pypi Pypi Total Downloads PyPI - License Release CD Tests CD Docs CD

A Library for Mechanistic Interpretability of Generative Language Models. Maintained by Bryce Meyer and created by Neel Nanda

Read the Docs Here

This is a library for doing mechanistic interpretability of GPT-2 Style language models. The goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms the model learned during training from its weights.

TransformerLens lets you load in 50+ different open source language models, and exposes the internal activations of the model to you. You can cache any internal activation in the model, and add in functions to edit, remove or replace these activations as the model runs.

Quick Start

Install

pip install transformer_lens

Use

import transformer_lens

# Load a model (eg GPT-2 Small)
model = transformer_lens.HookedTransformer.from_pretrained("gpt2-small")

# Run the model and get logits and activations
logits, activations = model.run_with_cache("Hello World")

Key Tutorials

Gallery

Research done involving TransformerLens:

User contributed examples of the library being used in action:

Check out our demos folder for more examples of TransformerLens in practice

Getting Started in Mechanistic Interpretability

Mechanistic interpretability is a very young and small field, and there are a lot of open problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if you would like to help, please try working on one! The standard answer to "why has no one done this yet" is just that there aren't enough people! Key resources:

Support & Community

Contributing Guide

If you have issues, questions, feature requests or bug reports, please search the issues to check if it's already been answered, and if not please raise an issue!

You're also welcome to join the open source mech interp community on Slack. Please use issues for concrete discussions about the package, and Slack for higher bandwidth discussions about eg supporting important new use cases, or if you want to make substantial contributions to the library and want a maintainer's opinion. We'd also love for you to come and share your projects on the Slack!

:exclamation: HookedSAETransformer Removed

Hooked SAE has been removed from TransformerLens in version 2.0. The functionality is being moved to SAELens. For more information on this release, please see the accompanying announcement for details on what's new, and the future of TransformerLens.

Credits

This library was created by Neel Nanda and is maintained by Bryce Meyer.

The core features of TransformerLens were heavily inspired by the interface to Anthropic's excellent Garcon tool. Credit to Nelson Elhage and Chris Olah for building Garcon and showing the value of good infrastructure for enabling exploratory research!

Creator's Note (Neel Nanda)

I (Neel Nanda) used to work for the Anthropic interpretability team, and I wrote this library because after I left and tried doing independent research, I got extremely frustrated by the state of open source tooling. There's a lot of excellent infrastructure like HuggingFace and DeepSpeed to use or train models, but very little to dig into their internals and reverse engineer how they work. This library tries to solve that, and to make it easy to get into the field even if you don't work at an industry org with real infrastructure! One of the great things about mechanistic interpretability is that you don't need large models or tons of compute. There are lots of important open problems that can be solved with a small model in a Colab notebook!

Citation

Please cite this library as:

@misc{nanda2022transformerlens,
    title = {TransformerLens},
    author = {Neel Nanda and Joseph Bloom},
    year = {2022},
    howpublished = {\url{https://github.com/TransformerLensOrg/TransformerLens}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_lens-2.9.0.tar.gz (145.2 kB view details)

Uploaded Source

Built Distribution

transformer_lens-2.9.0-py3-none-any.whl (176.9 kB view details)

Uploaded Python 3

File details

Details for the file transformer_lens-2.9.0.tar.gz.

File metadata

  • Download URL: transformer_lens-2.9.0.tar.gz
  • Upload date:
  • Size: 145.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for transformer_lens-2.9.0.tar.gz
Algorithm Hash digest
SHA256 cdbad53a8c9a18d6a4e47bd1ee43d5e5ceac92c0bee44830023087b59e54417f
MD5 fad80eac5aa7b4e45b45793e47034345
BLAKE2b-256 9e71e36403ece80632a48c7eeb2ffed10a6288cf458ec9d8d8895fe9dea5bf15

See more details on using hashes here.

File details

Details for the file transformer_lens-2.9.0-py3-none-any.whl.

File metadata

  • Download URL: transformer_lens-2.9.0-py3-none-any.whl
  • Upload date:
  • Size: 176.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for transformer_lens-2.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6eed8170999f86bd5a7e9b99d5b07544080d8946d20a2a69e17ee2bc398f6ede
MD5 e2ffdef0c10f7d55469cdad6db9f4b7e
BLAKE2b-256 4c86ce6646dfcbde4d9459a067a7c7ddb4c5133c6ab34c2ef6376ccccf01a2e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page