Skip to main content

Quickly initialize bespoke Transformer models

Project description


Quickly initialize bespoke Transformer models


The goal of the xfmers library is to provide a simple API to quickly initialise Transformers with specified hyperparameters and features. The library generates standard TF 2.0 Keras models that can be used with, Automatic Mixed Precision (AMP), XLA, Horovod and tf.distribute APIs.

Included layers/features:

  • Multi-head attention
    • Toggle for Encoder or Decoder (causal) mode
  • Transformer layers
    • Toggle for GPT-2 style Layer Normalization
  • Transformer Stack (Encoder/Decoder)
    • Toggle for weight sharing (ALBERT-like)
  • Embedding Layer
    • Learnable positional embeddings
    • Factorized embedding parameterization (ALBERT-like)


Creating an ALBERT-like Transformer

inputs = tf.keras.Input(shape=(None, ), name="inputs")
padding_mask = layers.PaddingMaskGenerator()(inputs)
embeddings = layers.TokenPosEmbedding(d_vocab=vocab_size, d_model=128, pos_length=512,
                                      # project embedding from 128 -> 512

# build encoder
encoder_block = layers.TransformerStack(layers=3,
                                        causal=False,        # attend pair-wise between all positons
                                        weight_sharing=True, # share weights between all encoder layers
enc_outputs = encoder_block({"token_inputs": embeddings,
                             "mask_inputs": padding_mask})

# build decoder (causal)
decoder_block = layers.TransformerStack(layers=3,
                                        causal=True,         # cannot attend to "future" positions
                                        weight_sharing=True, # share weights between all decoder layers
dec_outputs = decoder_block({"token_inputs": enc_outputs,
                             "mask_inputs": padding_mask})

dec_outputs = tf.keras.layers.LayerNormalization(epsilon=1e-6)(dec_outputs)

# language modelling head
preds = layers.LMHead(vocab_size=vocab_size, name="outputs")(dec_outputs)

# Keras model
model = tf.keras.Model(inputs=inputs, outputs=preds, name=name)

Installing Xfmers

Install from Pip

# coming soon
pip install xfmers

Install from source

# coming soon


  • Core Maintainer: Timothy Liu (tlkh)
  • This is not an official NVIDIA product!
  • The website, its software and all content found on it are provided on an “as is” and “as available” basis. NVIDIA/NVAITC does not give any warranties, whether express or implied, as to the suitability or usability of the website, its software or any of its content. NVIDIA/NVAITC will not be liable for any loss, whether such loss is direct, indirect, special or consequential, suffered by any party as a result of their use of the libraries or content. Any usage of the libraries is done at the user’s own risk and the user will be solely responsible for any damage to any computer system or loss of data that results from such activities.
  • Please open an issue if you encounter problems or have a feature request

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for xfmers, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size xfmers-0.0.2-py3-none-any.whl (11.0 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size xfmers-0.0.2.tar.gz (6.1 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page