Skip to main content

Tools for understanding how transformer predictions are built layer-by-layer

Project description

Tuned Lens 🔎

Open In Colab Open in Spaces

Tools for understanding how transformer predictions are built layer-by-layer.

This package provides a simple interface for training and evaluating tuned lenses. A tuned lens allows us to peek at the iterative computations a transformer uses to compute the next token.

What is a Lens?

A diagram showing how a translator within the lens allows you to skip intermediate layers.

A lens into a transformer with n layers allows you to replace the last m layers of the model with an affine transformation (we call these affine translators). Each affine translator is trained to minimize the KL divergence between its prediction and the final output distribution of the original model. This means that after training, the tuned lens allows you to skip over these last few layers and see the best prediction that can be made from the model's intermediate representations, i.e., the residual stream, at layer n - m.

The reason we need to train an affine translator is that the representations may be rotated, shifted, or stretched from layer to layer. This training differentiates this method from simpler approaches that unembed the residual stream of the network directly using the unembedding matrix, i.e., the logit lens. We explain this process and its applications in the paper Eliciting Latent Predictions from Transformers with the Tuned Lens.

Acknowledgments

Originally conceived by Igor Ostrovsky and Stella Biderman at EleutherAI, this library was built as a collaboration between FAR and EleutherAI researchers.

Install Instructions

Installing from PyPI

First, you will need to install the basic prerequisites into a virtual environment:

  • Python 3.9+
  • PyTorch 1.13.0+

Then, you can simply install the package using pip.

pip install tuned-lens

Installing the container

If you prefer to run the training scripts from within a container, you can use the provided Docker container.

docker pull ghcr.io/alignmentresearch/tuned-lens:latest
docker run --rm tuned-lens:latest tuned-lens --help

Contributing

Make sure to install the dev dependencies and install the pre-commit hooks.

$ git clone https://github.com/AlignmentResearch/tuned-lens.git
$ pip install -e ".[dev]"
$ pre-commit install

Citation

If you find this library useful, please cite it as:

@article{belrose2023eliciting,
  title={Eliciting Latent Predictions from Transformers with the Tuned Lens},
  authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},
  journal={to appear},
  year={2023}
}

Warning This package has not reached 1.0. Expect the public interface to change regularly and without a major version bumps.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuned-lens-0.0.4.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tuned_lens-0.0.4-py3-none-any.whl (66.3 kB view details)

Uploaded Python 3

File details

Details for the file tuned-lens-0.0.4.tar.gz.

File metadata

  • Download URL: tuned-lens-0.0.4.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for tuned-lens-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f2baeb4a75743288fdc6b735102d3d0ac53ab0e7890208413120f30ed7e1e1ad
MD5 dd44d8fd8003645ac0736afad9d00c6f
BLAKE2b-256 1c93f76bbd5310fd29eb73d2bc3dacca50bf740b0b0f9e9c079739a93a3c01e9

See more details on using hashes here.

File details

Details for the file tuned_lens-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: tuned_lens-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 66.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for tuned_lens-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9528a538bc3e7368b8a4583f026af47aba037d6f1d50457a9fcf2c3ec0a624bb
MD5 745984f06e021f476a40b47cdbaf7e64
BLAKE2b-256 6028421d780f7ed5f24f31d8140a7883c8919725b7d3bc1fee21cdd3fa91bf46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page