A remade version of basic transformers
Project description
This File will go through most of the methods directly accessible through the Alpaca class.
Note: This md file was made with help from deepseekAI's chatbot to help with grammar, wording, and formatting since its better than me at it.
Table of Contents
Core Transformer Components
This Section will go over the core functionality and computation going on behind the scenes as well as the specific functions and methods are accessible through an 'Alpaca' object.
Layout
The layout of an Alpaca-Transformer is according to the following Structure:
Encoder
Embedding
The Embedding Layer is accessed through the Alpaca.token_embedding() method. The method takes in two params: vocab_size and embedding_dim. In summary, the embedding layer is a trainable lookup table that maps discrete token IDs (integers) to continuous vector representations. It initializes as a matrix of random values with dimensions (vocab_size x embedding_dim), where each row corresponds to a token's embedding. During the forward pass, it retrieves the embedding vectors for the input token IDs, enabling the model to process text as dense, meaningful vectors. These embeddings are optimized during training to capture semantic and syntactic relationships between tokens.
Here is a code-based demonstration of the embedding layer concept:
import torch
# Define the embedding layer
vocab_size = 4 # Number of unique tokens in the vocabulary
embedding_dim = 4 # Dimensionality of the embedding vectors
# Randomly initialize the embedding matrix
embedding_matrix = torch.randn(vocab_size, embedding_dim, requires_grad=True)
# Example input: Token IDs
input_ids = torch.tensor([1, 2, 3, 0]) # Shape: (sequence_length,)
# Forward pass: Retrieve embeddings
output = embedding_matrix[input_ids] # Shape: (sequence_length, embedding_dim)
# Print Aspects
print("Embedding Matrix:")
print(embedding_matrix)
print("\nInput IDs:")
print(input_ids)
print("\nOutput Embeddings:")
print(output)
Here is how you would use it directly from Alpaca:
from Alpaca import Alpaca
alpaca = Alpaca()
VOCAB_SIZE = 4
EMBEDDING_DIM = 4
embedding_layer = alpaca.token_embedding(VOCAB_SIZE, EMBEDDING_DIM)
input_ids = torch.tensor([1, 2, 3, 0])
output = embedding_layer.forward(input_ids)
print(output)
Positional Encoding
The Positional Encoding Layer is accessed through the Alpaca.pos_encoding() method. This method takes in 2 params: embedding_dim the embedding dimension and max_seq_len the max sequence length. Positional encoding is used in Transformer models to provide information about the position of each token in a sequence. Since Transformers don’t have a built-in notion of word order (unlike RNNs), positional encodings are added to the token embeddings to give the model a sense of where each token is located relative to others.
Here is a code-based demonstration of the positional encoding concept using basic PyTorch:
import torch
import math
# Predefined values
embedding_dim = 4 # Embedding dimension
max_seq_len = 10 # Maximum sequence length
# Initialize positional encodings matrix
position_encodings = torch.zeros(max_seq_len, embedding_dim)
# Fill the matrix with sine and cosine values
for pos in range(max_seq_len):
for i in range(0, embedding_dim, 2):
position_encodings[pos, i] = math.sin(pos / (10000 ** (i / embedding_dim)))
if i + 1 < embedding_dim:
position_encodings[pos, i + 1] = math.cos(pos / (10000 ** (i / embedding_dim)))
# Example input: Batch of 2 sequences, each of length 5
input_ids = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 0]])
# Retrieve positional encodings for the input sequence length
seq_len = input_ids.size(1)
output = position_encodings[:seq_len, :].unsqueeze(0).expand(input_ids.size(0), -1, -1)
print("Positional Encodings Matrix:")
print(position_encodings)
print("\nInput IDs:")
print(input_ids)
print("\nOutput Positional Encodings:")
print(output)
Here is how you would use it directly from Alpaca:
from Alpaca import Alpaca
alpaca = Alpaca()
EMBEDDING_DIM = 4
MAX_SEQ_LEN = 10
pos_encoding_layer = alpaca.pos_encoding(EMBEDDING_DIM, MAX_SEQ_LEN)
input_ids = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 0]])
position_encodings = pos_encoding_layer.forward(input_ids)
print(position_encodings)
Encoder Block
The Encoder Block is made of:
- A Multi-Head Self-Attention Layer, which is callable via the
Alpaca.multi_self_attention()method. It takes in:d_model: The model's dimensionality.num_heads: The number of attention heads.masked: A boolean flag to indicate whether masking is applied (used in the decoder).
- A Feed-Forward Network (FFN), which is callable via the
Alpaca.ffn()method. It takes in:d_model: The model's dimensionality.ff_dim: The dimensionality of the hidden layer in the feed-forward network.
- Layer Normalization and Dropout for stabilization and regularization.
Here is a code-based demonstration of the Encoder Block using basic PyTorch:
import torch
import torch.nn as nn
# Predefined values
d_model = 4 # Model dimensionality
num_heads = 2 # Number of attention heads
ff_dim = 8 # Feed-forward hidden layer dimensionality
seq_len = 5 # Sequence length
batch_size = 2 # Batch size
# Input tensor (batch of 2 sequences, each of length 5)
x = torch.randn(batch_size, seq_len, d_model)
# Multi-Head Self-Attention
d_k = d_model // num_heads # Dimension of each head
# Linear transformations for queries, keys, and values
W_q = nn.Linear(d_model, d_model, bias=False)
W_k = nn.Linear(d_model, d_model, bias=False)
W_v = nn.Linear(d_model, d_model, bias=False)
W_o = nn.Linear(d_model, d_model, bias=False)
# Compute queries, keys, and values
Q = W_q(x).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
K = W_k(x).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
V = W_v(x).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
# Scaled dot-product attention
scores = (Q @ K.transpose(-2, -1)) / (d_k ** 0.5)
attention = torch.softmax(scores, dim=-1)
attn_output = (attention @ V).transpose(1, 2).reshape(batch_size, seq_len, d_model)
attn_output = W_o(attn_output)
# Add & Norm (Layer Normalization and Dropout)
layer_norm1 = nn.LayerNorm(d_model)
dropout = nn.Dropout(0.1)
attn_output = layer_norm1(x + dropout(attn_output))
# Feed-Forward Network
linear1 = nn.Linear(d_model, ff_dim)
relu = nn.ReLU()
linear2 = nn.Linear(ff_dim, d_model)
ffn_output = linear1(attn_output)
ffn_output = relu(ffn_output)
ffn_output = linear2(ffn_output)
# Add & Norm (Layer Normalization and Dropout)
layer_norm2 = nn.LayerNorm(d_model)
output = layer_norm2(attn_output + dropout(ffn_output))
print("Input Tensor Shape:", x.shape)
print("Output Tensor Shape:", output.shape)
Here is how you would use it directly from Alpaca:
from Alpaca import Alpaca
alpaca = Alpaca()
D_MODEL = 4
NUM_HEADS = 2
FF_DIM = 8
encoder_block = alpaca.encoder_block(D_MODEL, NUM_HEADS, FF_DIM)
input_tensor = torch.randn(2, 5, D_MODEL) # Batch of 2 sequences, each of length 5
output = encoder_block.forward(input_tensor)
print(output)
Decoder
Embedding
The Decoder's Embedding Layer works exactly the same as the Encoder's Embedding Layer. For details, refer to the Embedding section.
Positional Encoding
The Decoder's Positional Encoding Layer works exactly the same as the Encoder's Positional Encoding Layer. For details, refer to the Positional Encoding section.
Decoder Blocks
The Decoder Block is made of:
- A Multi-Head Self-Attention Layer, which works the same as in the Encoder. For details, refer to the Multi-Head Self-Attention section.
- A Multi-Head Cross-Attention Layer, which is unique to the Decoder.
- A Feed-Forward Network (FFN), which works the same as in the Encoder. For details, refer to the Feed-Forward Network section.
- Layer Normalization and Dropout for stabilization and regularization.
Multi-Head Cross-Attention
The Multi-Head Cross-Attention Layer is unique to the Decoder. It allows the Decoder to attend to the Encoder's output, enabling the model to incorporate information from the input sequence when generating the output sequence. It works similarly to Multi-Head Self-Attention but uses the Encoder's output for keys (K) and values (V), while the queries (Q) come from the Decoder's input.
Here is a code-based demonstration of Multi-Head Cross-Attention using basic PyTorch:
import torch
import torch.nn as nn
# Predefined values
d_model = 4 # Model dimensionality
num_heads = 2 # Number of attention heads
seq_len = 5 # Sequence length
batch_size = 2 # Batch size
# Input tensors
x = torch.randn(batch_size, seq_len, d_model) # Decoder input
encoder_output = torch.randn(batch_size, seq_len, d_model) # Encoder output
# Dimension of each head
d_k = d_model // num_heads
# Linear transformations for queries, keys, and values
W_q = nn.Linear(d_model, d_model, bias=False) # Query weights
W_k = nn.Linear(d_model, d_model, bias=False) # Key weights
W_v = nn.Linear(d_model, d_model, bias=False) # Value weights
W_o = nn.Linear(d_model, d_model, bias=False) # Output weights
# Compute queries (from Decoder input)
Q = W_q(x).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
# Compute keys and values (from Encoder output)
K = W_k(encoder_output).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
V = W_v(encoder_output).view(batch_size, seq_len, num_heads, d_k).transpose(1, 2)
# Scaled dot-product attention
scores = (Q @ K.transpose(-2, -1)) / (d_k ** 0.5)
attention = torch.softmax(scores, dim=-1)
# Compute output
output = (attention @ V).transpose(1, 2).reshape(batch_size, seq_len, d_model)
output = W_o(output)
print("Decoder Input Shape:", x.shape)
print("Encoder Output Shape:", encoder_output.shape)
print("Cross-Attention Output Shape:", output.shape)
Here is how you would use it directly from Alpaca:
from Alpaca import Alpaca
alpaca = Alpaca()
D_MODEL = 4
NUM_HEADS = 2
# Create the Multi-Head Cross-Attention layer
cross_attention_layer = alpaca.multi_cross_attention(D_MODEL, NUM_HEADS)
# Example inputs
decoder_input = torch.randn(2, 5, D_MODEL) # Decoder input
encoder_output = torch.randn(2, 5, D_MODEL) # Encoder output
# Forward pass
output = cross_attention_layer.forward(decoder_input, encoder_output)
print(output)
Feed-Forward Network
The Decoder's Feed-Forward Network works exactly the same as the Encoder's Feed-Forward Network. For details, refer to the Feed-Forward Network section.
Layer Normalization and Dropout
The Decoder uses Layer Normalization and Dropout in the same way as the Encoder. For details, refer to the Encoder Block section.
```markdown
## Transformer Implementation
This section covers the practical implementation of the Alpaca Transformer, including how to create a Transformer model, use the Tokenizer, handle datasets, train the model, and perform inference.
---
### Creating an Alpaca Transformer
To create a Transformer model, use the `alpaca.new_transformer()` method. This method initializes and returns a Transformer with the specified parameters.
---
##### Here is how you create a Transformer:
```python
from Alpaca import Alpaca
# Initialize Alpaca
alpaca = Alpaca()
# Define parameters
VOCAB_SIZE = 10000 # Size of the vocabulary
D_MODEL = 512 # Dimensionality of the model
NUM_HEADS = 8 # Number of attention heads
FF_DIM = 2048 # Dimensionality of the feed-forward network
NUM_LAYERS = 6 # Number of encoder/decoder layers
MAX_SEQ_LEN = 128 # Maximum sequence length
# Create the Transformer
transformer = alpaca.new_transformer(VOCAB_SIZE, D_MODEL, NUM_HEADS, FF_DIM, NUM_LAYERS, MAX_SEQ_LEN)
print(transformer)
Tokenizer
The Tokenizer is a crucial component for converting text into tokens and vice versa. It supports creating vocabularies, tokenizing text, detokenizing tokens, and saving/loading vocabularies.
Accessing the Tokenizer
The Tokenizer is created automatically when you instantiate the Alpaca class. You can access it using:
tokenizer = alpaca.tokenizer()
Tokenizer Methods
tokenize(text, vocab=None, save_as_file=False, save_file_path='tokens.txt')
- Purpose: Converts input text into tokens using the vocabulary.
- Parameters:
text: The input text to tokenize.vocab: Optional. A pre-existing vocabulary to use. If not provided, the Tokenizer will create one.save_as_file: IfTrue, saves the tokens to a file.save_file_path: The path to save the tokens file.
- Returns: A list of tokens.
Example:
text = "Doing work is a lot of work!"
tokens = tokenizer.tokenize(text)
print("Tokens:", tokens)
detokenize(tokenized, vocab=None, include_unknown=False)
- Purpose: Converts tokens back into text.
- Parameters:
tokenized: A list of tokens to detokenize.vocab: Optional. A pre-existing vocabulary to use. If not provided, the Tokenizer's current vocabulary is used.include_unknown: IfTrue, includes<unk>for unknown tokens.
- Returns: The detokenized text.
Example:
detokenized_text = tokenizer.detokenize(tokens)
print("Detokenized Text:", detokenized_text)
create_vocab(text, num_merges=5)
- Purpose: Creates a vocabulary from the input text using Byte Pair Encoding (BPE).
- Parameters:
text: The input text to create the vocabulary from.num_merges: The number of merge operations to perform.
- Returns: The created vocabulary.
Example:
vocab = tokenizer.create_vocab(text)
print("Vocabulary:", vocab)
load_vocab(vocab_path)
- Purpose: Loads a vocabulary from a JSON file.
- Parameters:
vocab_path: The path to the vocabulary JSON file.
- Returns: The loaded vocabulary.
Example:
vocab = tokenizer.load_vocab("vocab.json")
print("Loaded Vocabulary:", vocab)
save_as_file(vocab_save_path='vocab.json', token_save_path='tokens.json')
- Purpose: Saves the current vocabulary and tokens to JSON files.
- Parameters:
vocab_save_path: The path to save the vocabulary file.token_save_path: The path to save the tokens file.
Example:
tokenizer.save_as_file("my_vocab.json", "my_tokens.json")
Example Workflow:
from Alpaca import Alpaca
# Initialize Alpaca and Tokenizer
alpaca = Alpaca()
tokenizer = alpaca.tokenizer()
# Example text
text = "Doing work is a lot of work!"
# Create vocabulary
vocab = tokenizer.create_vocab(text)
# Tokenize text
tokens = tokenizer.tokenize(text)
print("Tokens:", tokens)
# Detokenize tokens
detokenized_text = tokenizer.detokenize(tokens)
print("Detokenized Text:", detokenized_text)
# Save vocabulary and tokens
tokenizer.save_as_file("vocab.json", "tokens.json")
Creating an Alpaca Dataset
The alpaca.dataset() method creates a dataset from a text file. It tokenizes the text and prepares it for training.
Syntax:
dataset = alpaca.dataset(txt_file, tokenizer=None, vocab=None, max_seq_len=512, merges=5000)
- Parameters:
txt_file: The path to the text file.tokenizer: Optional. A Tokenizer object. If not provided, the default Tokenizer is used.vocab: Optional. A pre-existing vocabulary. If not provided, the Tokenizer will create one.max_seq_len: The maximum sequence length for the dataset.merges: The number of merge operations for Byte Pair Encoding (BPE).
- Returns: A dataset object ready for training.
Example:
# Create a dataset from a text file
dataset = alpaca.dataset("my_text_file.txt", max_seq_len=128)
print(dataset)
Training an Alpaca Transformer
The alpaca.train_model() method trains the Transformer model. If no Transformer is provided, it uses the one stored in the Alpaca object.
Syntax:
alpaca.train_model(epochs, train_dl, optimizer=torch.optim.Adam, transformer=None, loss_fn=nn.CrossEntropyLoss, lr=1e-4, validate_data=False, validation_data=None, wandb_tracking=False, lr_scheduler=False)
- Parameters:
epochs: The number of training epochs.train_dl: The training DataLoader.optimizer: The optimizer to use (default istorch.optim.Adam).transformer: Optional. A Transformer model. If not provided, the default Transformer in the Alpaca object is used.loss_fn: The loss function (default isnn.CrossEntropyLoss).lr: The learning rate (default is1e-4).validate_data: IfTrue, performs validation during training.validation_data: Optional. The validation DataLoader.wandb_tracking: IfTrue, enables Weights & Biases tracking.lr_scheduler: IfTrue, enables a learning rate scheduler.
- Returns: The trained Transformer model.
Example:
# Train the model
alpaca.train_model(epochs=10, train_dl=train_dataloader, lr=1e-4, validate_data=True, validation_data=val_dataloader)
Creating Predictions Using an Alpaca Transformer
The alpaca.inference() method generates predictions using the Transformer. If no state dictionary is provided, it uses the one stored in the Alpaca object.
Syntax:
output = alpaca.inference(tokens, state_dict=None, detokenize=False, vocab=None)
- Parameters:
tokens: The input tokens for inference.state_dict: Optional. A state dictionary for the model. If not provided, the default one in the Alpaca object is used.detokenize: IfTrue, returns the output as text. IfFalse, returns tokens.vocab: Optional. A vocabulary for detokenization. If not provided, the Tokenizer's vocabulary is used.
- Returns: The model's output (either tokens or text).
Example:
# Perform inference
output = alpaca.inference(tokens, detokenize=True)
print("Model Output:", output)
Summary of Workflow
from Alpaca import Alpaca
# Initialize Alpaca
alpaca = Alpaca()
# Create a Transformer
transformer = alpaca.new_transformer(vocab_size=10000, d_model=512, num_heads=8, ff_dim=2048, num_layers=6, max_seq_len=128)
# Create a dataset
dataset = alpaca.dataset("my_text_file.txt", max_seq_len=128)
# Create a DataLoader
train_dataloader = DataLoader(dataset, batch_size=batch_size)
# Train the model
alpaca.train_model(epochs=10, train_dl=train_dataloader, lr=1e-4, validate_data=True, validation_data=val_dataloader)
# Perform inference
output = alpaca.inference(tokens, detokenize=True)
print("Model Output:", output)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alpaca_transformer-0.1.9.4.tar.gz.
File metadata
- Download URL: alpaca_transformer-0.1.9.4.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aa49fe5478abb2fe9667dffee39c5414b293f08d3e85b78e156a474c07fdd14
|
|
| MD5 |
beee699bc94fc0b565cfe2cbf9037ea4
|
|
| BLAKE2b-256 |
1d9b31782b71a697abf2bb499bed6ae5606bd2ac27f58da0c52935f4bad74485
|
File details
Details for the file alpaca_transformer-0.1.9.4-py3-none-any.whl.
File metadata
- Download URL: alpaca_transformer-0.1.9.4-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe2289284aa3a65afd26fe8b54704d895f8f88a77b87214e4edca0ffef6ee16e
|
|
| MD5 |
1913f7c83114df77534b04f96fa2f85d
|
|
| BLAKE2b-256 |
04de73e6aca3bd438cc906e5e11c565d362cf2927295192f94f72f58f82bb0cb
|