Tools for understanding how transformer predictions are built layer-by-layer
Reason this release was yanked:
package is broken and cannot be imported
Project description
Tuned Lens 🔎
Tools for understanding how transformer predictions are built layer-by-layer
This package provides a simple interface training and evaluating tuned lenses. A tuned lens allows us to peak at the iterative computations that a transformer is using the compute the next token.
A lens into a transformer with n layers allows you to replace the last $m$ layers of the model with an affine transformation (we call these affine translators).
This skips over these last few layers and lets you see the best prediction that can be made from the model's intermediate representations, i.e. the residual stream, at layer $n - m$. Since the representations may be rotated, shifted, or stretched from layer to layer it's useful to train an affine specifically on each layer. This training is what differentiates this method from simpler approaches that decode the residual stream of the network directly using the unembeding layer i.e. the logit lens. We explain this process and its applications in a forthcoming paper "Eliciting Latent Predictions from Transformers with the Tuned Lens".
Acknowledgments
Originally concieved by Igor Ostrovsky and Stella Biderman at EleutherAI, this library was built as a collaboration between FAR and EleutherAI researchers.
Warning This package has not reached 1.0 yet. Expect the public interface to change regularly and without a major version bump.
Install instructions
Installing From Source
First you will need to install the basic prerequisites into a virtual environment
- Python 3.9+
- Pytorch 1.12.0+
then you can simply install the package using pip.
git clone https://github.com/AlignmentResearch/tuned-lens
cd tuned-lens
pip install .
Install Using Docker
If you prefer to run the code from within a container you can use the provided docker file
git clone https://github.com/AlignmentResearch/tuned-lens
cd tuned-lens
docker build -t tuned-lens-prod --target prod .
Quick Start Guid
Downloading the datasets
wget https://the-eye.eu/public/AI/pile/val.jsonl.zst
unzstd val.jsonl.zst
wget https://the-eye.eu/public/AI/pile/test.jsonl.zst
unzstd test.jsonl.zst
Evaluating a Lens
Once you have a lens file either by training it yourself of by downloading it. You can run various evaluations on it using the provided evaluation command.
tuned-lens eval gpt2 test.jsonl --lens gpt-2-lens
--dataset the_pile all \
--split validation \
--output lens_eval_results.json
Training a Lens
This will train a tuned lens on gpt-2 with the default hyper parameters.
tuned-lens train gpt2 val.jsonl
--dataset the_pile all \
--split validation \
--output gpt-2-lens
Note This will download the entire validation set of the pile which is over 30 GBs. If you are doing this within a docker file it's recommended to mount external storage to huggingface's cache directory.
Contributing
Make sure to install the dev dependencies and install the pre-commit hooks
$ pip install -e ".[dev]"
$ pre-commit install
Citation Information
If you find this library useful, please cite it as
@article{belrose2023eliciting,
title={Eliciting Latent Predictions from Transformers with the Tuned Lens},
authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},
journal={to appear},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tuned_lens-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75d92a93148eb8f8240837bc832cd56146fa1c182e4e2327bf1998a0951ad85b |
|
MD5 | 7e6bb6cb9bbc15c3ceb6eabd2144610a |
|
BLAKE2b-256 | 1ee450054e8b9ecd5e2e41716b2d35c47f21b2d8bf4a148a38e23154f8142c38 |