Attach custom heads to transformer models.

These details have not been verified by PyPI

Project description

Documentation | Getting Started | Reddit Post with more info

Transformer Heads

This library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.
A new head could be:

A linear probe used to get an understanding of the information processing in a transformer architecture
A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.
- E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.
- Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.

On top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.

Installation

Install from pypi: pip install transformer-heads.

Or, clone this repo and from the root of this repository: pip install -e .

Usage

Create head configurations

head_config = HeadConfig(
    name=f"imdb_head_3",
    layer_hook=-3,  # Attach at the output of the third-to-last transformer-block
    in_size=hidden_size,
    output_activation="linear",
    pred_for_sequence=True,
    loss_fct="cross_entropy",
    num_outputs=2,
    target="label" # The name of the ground-truth column in the dataset
)

Create a model with your head from a pretrained transformer model

model = load_headed(
    LlamaForCausalLM,
    "meta-llama/Llama-2-7b-hf",
    head_configs=[heads_config],
)

Train you model using (for example) the simple to use huggingface Trainer interface:

trainer = Trainer(
    model,
    args=args,
    train_dataset=imdb_dataset["train"],
    data_collator=collator,
)

For a more in-depth introduction and a fully working example, check the linear probe notebook.

Explanation of approach for training a transformer value function with QLoRA

The Base Model
- The value model builds on a pre-trained base large language model.
- That is, a transformer model trained on the causal language modelling objective on a large corpus of free flowing text
- To solve the task, LLMs have a linear causal language modelling head that projects from the hidden dimension for each token to the number of tokens in the vocabulary.
- The base model is not instruct tuned or trained by RLHF
Adding a value head
- The causal language modelling head is removed.
- It is replaced by a value head that projects from the hidden dimension for each token to a one-dimensional value prediction.
- The value head may be linear or a small multilayer perceptron.
- The value head is solving a regression task and is trained via the mean-squared-error loss.
Preparing for QLoRA training
- QLoRA is desired to reduce memory-overhead and enable DDP training.
- All weights from the model except the value-head are quantized and frozen.
- LoRA weights are trained for all these frozen weights.
- The value-head is still fully trained.

Joint training of multiple linear probes

_images/multi_linear_probe.svg

Notebooks

This repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.

Linear Probes (understanding the inner workings of transformers)
- Basic example with one probe for causal LM: notebooks/gpt2/linear_probe.ipynb
- Train many probes for causal LM at once: notebooks/gpt2/multi_linear_probe.ipynb
- Train many probes for text classification at once: notebooks/gpt2/text_classification_linear_probe.ipynb
Finetuning on a new type of task (with a new head)
- QLoRA: notebooks/gpt2/text_classification_qlora.ipynb
- Full finetuning: notebooks/gpt2/text_classification_full_finetune.ipynb
Joint multi-task learning
- Many heads doing completely different tasks + QLoRA, all trained at the same time: notebooks/gpt2/joint_multitask_learning.ipynb
Regression with pretrained transformers
- Check the regression heads of this notebook: notebooks/gpt2/joint_multitask_learning.ipynb
Saving and loading
- Notebook: notebooks/gpt2/saving_and_loading.ipynb
- Tests: transformer_heads/tests/test_load_model.py

Joint multi-task training with different types of heads and QLoRA.

_images/example_architecture.svg

More custom loss functions and models

At the state of writing, only a subset of loss functions are supported out of the box. Check transformer_heads/constants.py for more up to date info.

However, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to loss_fct_map and model_type_map. Just import from transformer_heads.constants. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:

from transformer_heads.constants import model_type_map, loss_fct_map
import torch.nn as nn
from transformers import MistralModel

loss_fct_map["bce"] = nn.BCELoss()
model_type_map["mistral"] = ("model",MistralModel)

Can my transformer architecture be supported?

One of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an attribute pointing to a base model that outputs raw hidden state. If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the model_type_map with the name of the attribute and the class of the base model. You can either do that by importing from constants.py or by adding it directly and creating a pull request.

Q&A

Is Llama-3 supported? YES! Check here
How do I use my model for inference? Check the notebooks or this issue to get started.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Mar 4, 2025

This version

0.2.1

Mar 4, 2025

0.2.0

Jan 23, 2025

0.1.4

Jan 8, 2025

0.1.3

Nov 29, 2024

0.1.1

Sep 12, 2024

0.1.0

Apr 28, 2024

0.0.15

Apr 24, 2024

0.0.14

Apr 15, 2024

0.0.13

Apr 14, 2024

0.0.12

Apr 14, 2024

0.0.11

Apr 14, 2024

0.0.10

Apr 2, 2024

0.0.9

Mar 31, 2024

0.0.8

Mar 31, 2024

0.0.7

Mar 30, 2024

0.0.6

Mar 30, 2024

0.0.5

Mar 29, 2024

0.0.4

Mar 25, 2024

0.0.3

Mar 25, 2024

0.0.2

Mar 25, 2024

0.0.1

Mar 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_heads-0.2.1.tar.gz (274.1 kB view details)

Uploaded Mar 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformer_heads-0.2.1-py3-none-any.whl (39.8 kB view details)

Uploaded Mar 4, 2025 Python 3

File details

Details for the file transformer_heads-0.2.1.tar.gz.

File metadata

Download URL: transformer_heads-0.2.1.tar.gz
Upload date: Mar 4, 2025
Size: 274.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for transformer_heads-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b2cc0b3505d43a3a1f98444587a95ca49351508d13d6d2138b081214ca3c8c26`
MD5	`0e5a1572ea00d18fe67632d946288d73`
BLAKE2b-256	`4e3714de1fe4857e6b2c6996c2723ed333038c7e189213edb7dbb27ebae0c1e4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for transformer_heads-0.2.1.tar.gz:

Publisher: publish_to_pypi.yml on center-for-humans-and-machines/transformer-heads

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: transformer_heads-0.2.1.tar.gz
- Subject digest: b2cc0b3505d43a3a1f98444587a95ca49351508d13d6d2138b081214ca3c8c26
- Sigstore transparency entry: 176864195
- Sigstore integration time: Mar 4, 2025
Source repository:
- Permalink: center-for-humans-and-machines/transformer-heads@35d62bade771bef16ae2e08365259cf3306e40e4
- Branch / Tag: refs/heads/main
- Owner: https://github.com/center-for-humans-and-machines
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish_to_pypi.yml@35d62bade771bef16ae2e08365259cf3306e40e4
- Trigger Event: workflow_dispatch

File details

Details for the file transformer_heads-0.2.1-py3-none-any.whl.

File metadata

Download URL: transformer_heads-0.2.1-py3-none-any.whl
Upload date: Mar 4, 2025
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for transformer_heads-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6bf7a8cc3b151654c76cc0e46ac5ce1011503fa16473a99d55a5cefc6fa8f5d3`
MD5	`f7d1dc9e22f880c77edfb46077024d49`
BLAKE2b-256	`c3e8afe0c12bcfbd33d7f792917879c678be2439212d2ca7fb8e4c6cc0adf5c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for transformer_heads-0.2.1-py3-none-any.whl:

Publisher: publish_to_pypi.yml on center-for-humans-and-machines/transformer-heads

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: transformer_heads-0.2.1-py3-none-any.whl
- Subject digest: 6bf7a8cc3b151654c76cc0e46ac5ce1011503fa16473a99d55a5cefc6fa8f5d3
- Sigstore transparency entry: 176864196
- Sigstore integration time: Mar 4, 2025
Source repository:
- Permalink: center-for-humans-and-machines/transformer-heads@35d62bade771bef16ae2e08365259cf3306e40e4
- Branch / Tag: refs/heads/main
- Owner: https://github.com/center-for-humans-and-machines
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish_to_pypi.yml@35d62bade771bef16ae2e08365259cf3306e40e4
- Trigger Event: workflow_dispatch

transformer-heads 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Documentation | Getting Started | Reddit Post with more info

Transformer Heads

Installation

Usage

Explanation of approach for training a transformer value function with QLoRA

Joint training of multiple linear probes

Notebooks

Joint multi-task training with different types of heads and QLoRA.

More custom loss functions and models

Can my transformer architecture be supported?

Q&A

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance