SwissArmyTransformer

A transformer-based framework with finetuning as the first class citizen.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Introduction

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

SwissArmyTransformer is named after "swiss army knife", meaning that all the models (e.g. BERT, GPT, T5, GLM, CogView, ViT...) share the same backone code and cater for versatile usages with some extra light-weight mixins.

SwissArmyTransformer is powered by deepspeed-ZeRO and model parallelism, aiming to provide the best practice for pretraining and finetuning large models (100M~20B parameters).

Install

    pip install SwissArmyTransformer

Features

Add model-agnostic components, e.g. prefix-tuning, in just ONE line!

Prefix-tuning (or P-tuning) improves finetuning via adding trainable parameters in each attention layer. To apply it to a GLM classification (or any other) model is easy with our library.

    class ClassificationModel(GLMModel):
        def __init__(self, args, transformer=None, **kwargs):
            super().__init__(args, transformer=transformer, **kwargs)
            self.add_mixin('classification_head', MLPHeadMixin(args.hidden_size, 2048, 1))
            # Arm an arbitrary model with Prefix-tuning with this line!
            self.add_mixin('prefix-tuning', PrefixTuningMixin(args.num_layers, args.hidden_size // args.num_attention_heads, args.num_attention_heads, args.prefix_len))

GPT and other auto-regressive models act differently during training and inference. During inference, text is generated token-by-token and we need to cache previous states for efficiency. With our lib, you only need to consider the behavior during training (teacher-forcing) and transform it to a cached auto-regressive model via adding a mixin:

    model = GLMModel(args)
    model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
    # Generate a sequence with beam search
    from SwissArmyTransformer.generation.autoregressive_sampling import filling_sequence
    from SwissArmyTransformer.generation.sampling_strategies import BeamSearchStrategy
    output, *mems = filling_sequence(model, input_seq,
                    batch_size=args.batch_size,
                    strategy=BeamSearchStrategy(args.batch_size))

Build your Transformer-based model with minimal codes. We mentioned GLM, which only differs from standard transformer (called BaseModel) on position embedding (and training losses). We only need to focus on the related part when coding.

Extend the whole definition:

class BlockPositionEmbeddingMixin(BaseMixin):
    # Here define parameters for the mixin
    def __init__(self, max_sequence_length, hidden_size, init_method_std=0.02):
        super(BlockPositionEmbeddingMixin, self).__init__()
        self.max_sequence_length = max_sequence_length
        self.hidden_size = hidden_size
        self.block_position_embeddings = torch.nn.Embedding(max_sequence_length, hidden_size)
        torch.nn.init.normal_(self.block_position_embeddings.weight, mean=0.0, std=init_method_std)
    
    # Here define the method for the mixin
    def position_embedding_forward(self, position_ids, **kwargs):
        position_ids, block_position_ids = position_ids[:, 0], position_ids[:, 1]
        position_embeddings = self.transformer.position_embeddings(position_ids)
        block_position_embeddings = self.block_position_embeddings(block_position_ids)
        return position_embeddings + block_position_embeddings

class GLMModel(BaseModel):
    def __init__(self, args, transformer=None, parallel_output=True):
        super().__init__(args, transformer=transformer, parallel_output=parallel_output)
        self.add_mixin('block_position_embedding', 
            BlockPositionEmbeddingMixin(args.max_sequence_length, args.hidden_size)
        ) # Add the mixin for GLM

    # we can also directly define hook-functions in the model.
    # E.g., The code below will remove position embeddings:

    # def position_embedding_forward(self, position_ids, **kwargs):
    #   return 0

Comprehensive supports for training. SwissArmyTransformer aims to provide the best practice for pretraining and finetuning, where you only need to finish forward_step and create_dataset_function but with hyperparameters to alter useful training configurations.
- Extend the training to multiple GPUs or nodes by specifying --num_nodes, --num_gpus and a simple hostfile.
- DeepSpeed and Model parallelism.
- Better integration of ZeRO-2 and activation checkpointing.
- Automatic extending and shuffling training data and memmap.
- Successfully support the training of CogView2.
- The only open-source codebase supporting finetuning T5-10B on GPUs currently.

Get started

    cd examples/cogview2
    ./scripts/text2image_cogview2.sh

Run GLM

Prepare input.txt. Example: "Welcome! This is the main page of SwissArmyTransformer".
Run the following commands:

    cd examples/glm
    ./scripts/generate_glm.sh config/model_glm_10B_chinese.sh

Output: [CLS]Welcome! This is the main page of SwissArmyTransformer. It is a comprehensive and clear explanation of the technical problems in the transformer. It is also an introduction to the development of the SwissArmy transformers. Welcome to Swiss Army Transforters. This is the main page of Swiss army tranforter. It's a complete and clean explaination of technology problem in the Tranformer, which is an integral part of the army's technological development. It also anintroduction of the developments of the Army technicians. Well, if you have any questions, please feel free to contact the official webs

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.11

Jan 19, 2024

0.4.10

Jan 12, 2024

0.4.9

Dec 7, 2023

0.4.8

Oct 9, 2023

0.4.7

Sep 28, 2023

0.4.6

Sep 10, 2023

0.4.5

Aug 17, 2023

0.4.4

Aug 7, 2023

0.4.3

Jul 20, 2023

0.4.2

Jul 19, 2023

0.4.1

Jul 13, 2023

0.4.0

Jul 5, 2023

0.3.7

Jun 1, 2023

0.3.6

May 17, 2023

0.3.5

May 16, 2023

0.3.4

May 14, 2023

0.3.2

Apr 21, 2023

0.3.1

Apr 12, 2023

0.2.12

Aug 22, 2022

0.2.11

Jul 31, 2022

0.2.9

Jul 16, 2022

0.2.8

Jul 15, 2022

0.2.7

Jul 15, 2022

0.2.6

Jun 27, 2022

0.2.5

Jun 22, 2022

0.2.4

Jun 18, 2022

0.2.3

Jun 16, 2022

0.2.2

Jun 16, 2022

0.2.1

Jun 15, 2022

0.2.0

Jun 12, 2022

0.1.10

Jun 10, 2022

This version

0.1.9

Jan 13, 2022

0.1.8

Jan 9, 2022

0.1.7

Jan 7, 2022

0.1.6

Dec 31, 2021

0.1.5

Dec 30, 2021

0.1.4

Dec 13, 2021

0.1.3

Dec 5, 2021

0.1.2

Nov 5, 2021

0.1.1

Nov 2, 2021

0.1

Nov 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SwissArmyTransformer-0.1.9.tar.gz (2.3 MB view hashes)

Uploaded Jan 13, 2022 Source

Built Distribution

SwissArmyTransformer-0.1.9-py3-none-any.whl (2.3 MB view hashes)

Uploaded Jan 13, 2022 Python 3

Hashes for SwissArmyTransformer-0.1.9.tar.gz

Hashes for SwissArmyTransformer-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`8ad0988018d827abeeaaf6fef21fa257fbbc284cf52c43352e24e142b783f4e7`
MD5	`73137d776e06a8212bf7426c9e6a00f1`
BLAKE2b-256	`07e4b6a3f9c7b7adae1882fe4e0eda30635e01b3cbd70afed220b3cdbb59e82e`

Hashes for SwissArmyTransformer-0.1.9-py3-none-any.whl

Hashes for SwissArmyTransformer-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e68d55eec608041d76af2a98cf39d369311f924e593893da1a3e696fb480faa0`
MD5	`de2adeae106fbf9425a59367a0082cc2`
BLAKE2b-256	`60104cc5bb770eddd1e77b0bda1c86ebf3f2de97334d9f55760e9cd7662f0508`