Transformers at zeta scales

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Zeta banner Build High-performance, agile, and scalable AI models with modular and re-useable building blocks!

Design Principles

Fluid Experimentation: Zeta aims to be effortless for researchers and industrial AI engineers to rapidly experiment with the latest modules and components like MultiGroupedQueryAttention or Unet and many others!
Production-Grade Reliability: Facilitate reproducibility with bleeding-edge performance.
Modularity: Modularized Lego Building Blocks for building and deploying the best ML Models!

🤝 Schedule a 1-on-1 Session

Book a 1-on-1 Session with Kye, the Creator, to discuss any issues, provide feedback, or explore how we can improve Zeta for you.

Installation

pip install zetascale

Initiating Your Journey

Creating a model empowered with the aforementioned breakthrough research features is a breeze. Here's how to quickly materialize the renowned Flash Attention

import torch
from zeta.nn.attention import FlashAttention

q = torch.randn(2, 4, 6, 8)
k = torch.randn(2, 4, 10, 8)
v = torch.randn(2, 4, 10, 8)

attention = FlashAttention(causal=False, dropout=0.1, flash=True)
output = attention(q, k, v)

print(output.shape)

`RelativePositionBias`

RelativePositionBias quantizes the distance between two positions into a certain number of buckets and then uses an embedding to get the relative position bias. This mechanism aids in the attention mechanism by providing biases based on relative positions between the query and key, rather than relying solely on their absolute positions.

from zeta.nn import RelativePositionBias
import torch

# Initialize the RelativePositionBias module
rel_pos_bias = RelativePositionBias()

# Example 1: Compute bias for a single batch
bias_matrix = rel_pos_bias(1, 10, 10)

# Example 2: Utilize in conjunction with an attention mechanism
# NOTE: This is a mock example, and may not represent an actual attention mechanism's complete implementation.
class MockAttention(nn.Module):
    def __init__(self):
        super().__init__()
        self.rel_pos_bias = RelativePositionBias()

    def forward(self, queries, keys):
        bias = self.rel_pos_bias(queries.size(0), queries.size(1), keys.size(1))
        # Further computations with bias in the attention mechanism...
        return None  # Placeholder

# Example 3: Modify default configurations
custom_rel_pos_bias = RelativePositionBias(bidirectional=False, num_buckets=64, max_distance=256, n_heads=8)

`FeedForward`

The FeedForward module performs a feedforward operation on the input tensor x. It consists of a multi-layer perceptron (MLP) with an optional activation function and LayerNorm.

from zeta.nn import FeedForward

model = FeedForward(
  256, 
  512, 
  glu=True, 
  post_act_ln=True, 
  dropout=0.2
)

x = torch.randn(1, 256)

output = model(x)
print(output.shape)

`BitLinear`

The BitLinear module performs linear transformation on the input data, followed by quantization and dequantization. The quantization process is performed using the absmax_quantize function, which quantizes the input tensor based on the absolute maximum value, from the paper

import torch
from torch import nn
import zeta.quant as qt

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.linear = qt.BitLinear(10, 20)

    def forward(self, x):
        return self.linear(x)

# Initialize the model
model = MyModel()

# Create a random tensor of size (128, 10)
input = torch.randn(128, 10)

# Perform the forward pass
output = model(input)

# Print the size of the output
print(output.size())  # torch.Size([128, 20])

`PalmE`

This is an implementation of the multi-modal Palm-E model using a decoder llm as the backbone with an VIT image encoder to process vision, it's very similiar to GPT4, Kosmos, RTX2, and many other multi-modality model architectures

import torch
from zeta.structs import (
  AutoregressiveWrapper,
  Decoder,
  Encoder,
  Transformer,
  ViTransformerWrapper,
)


class PalmE(torch.nn.Module):
    """
    PalmE is a transformer architecture that uses a ViT encoder and a transformer decoder.

    Args:

        image_size (int): Size of the image.
        patch_size (int): Size of the patch.
        encoder_dim (int): Dimension of the encoder.
        encoder_depth (int): Depth of the encoder.
        encoder_heads (int): Number of heads in the encoder.
        num_tokens (int): Number of tokens.
        max_seq_len (int): Maximum sequence length.
        decoder_dim (int): Dimension of the decoder.
        decoder_depth (int): Depth of the decoder.
        decoder_heads (int): Number of heads in the decoder.
        alibi_num_heads (int): Number of heads in the alibi attention.
        attn_kv_heads (int): Number of heads in the attention key-value projection.
        use_abs_pos_emb (bool): Whether to use absolute positional embeddings.
        cross_attend (bool): Whether to cross attend in the decoder.
        alibi_pos_bias (bool): Whether to use positional bias in the alibi attention.
        rotary_xpos (bool): Whether to use rotary positional embeddings.
        attn_flash (bool): Whether to use attention flash.
        qk_norm (bool): Whether to normalize the query and key in the attention layer.

    Returns:

            torch.Tensor: The output of the model.

    Usage:

            >>> img = torch.randn(1, 3, 256, 256)
            >>> text = torch.randint(0, 20000, (1, 1024))
            >>> model = PalmE()
            >>> output = model(img, text)
            >>> print(output)

    """

    def __init__(
        self,
        image_size=256,
        patch_size=32,
        encoder_dim=512,
        encoder_depth=6,
        encoder_heads=8,
        num_tokens=20000,
        max_seq_len=1024,
        decoder_dim=512,
        decoder_depth=6,
        decoder_heads=8,
        alibi_num_heads=4,
        attn_kv_heads=2,
        use_abs_pos_emb=False,
        cross_attend=True,
        alibi_pos_bias=True,
        rotary_xpos=True,
        attn_flash=True,
        qk_norm=True,
    ):
        super(PalmE, self).__init__()

        # vit architecture
        self.encoder = ViTransformerWrapper(
            image_size=image_size,
            patch_size=patch_size,
            attn_layers=Encoder(
                dim=encoder_dim, depth=encoder_depth, heads=encoder_heads
            ),
        )

        # palm model architecture
        self.decoder = Transformer(
            num_tokens=num_tokens,
            max_seq_len=max_seq_len,
            use_abs_pos_emb=use_abs_pos_emb,
            attn_layers=Decoder(
                dim=decoder_dim,
                depth=decoder_depth,
                heads=decoder_heads,
                cross_attend=cross_attend,
                alibi_pos_bias=alibi_pos_bias,
                alibi_num_heads=alibi_num_heads,
                rotary_xpos=rotary_xpos,
                attn_kv_heads=attn_kv_heads,
                attn_flash=attn_flash,
                qk_norm=qk_norm,
            ),
        )

        # autoregressive wrapper to enable generation of tokens
        self.decoder = AutoregressiveWrapper(self.decoder)

    def forward(self, img: torch.Tensor, text: torch.Tensor):
        """Forward pass of the model."""
        try:
            encoded = self.encoder(img, return_embeddings=True)
            return self.decoder(text, context=encoded)
        except Exception as error:
            print(f"Failed in forward method: {error}")
            raise

# Usage with random inputs
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))

# Initiliaze the model
model = PalmE()
output = model(img, text)
print(output)

`Unet`

Unet is a famous convolutional neural network architecture originally used for biomedical image segmentation but soon became the backbone of the generative AI Mega-revolution. The architecture comprises two primary pathways: downsampling and upsampling, followed by an output convolution. Due to its U-shape, the architecture is named U-Net. Its symmetric architecture ensures that the context (from downsampling) and the localization (from upsampling) are captured effectively.

import torch
from zeta.nn import Unet  

# Initialize the U-Net model
model = Unet(n_channels=1, n_classes=2)

# Random input tensor with dimensions [batch_size, channels, height, width]
x = torch.randn(1, 1, 572, 572)

# Forward pass through the model
y = model(x)

# Output
print(f"Input shape: {x.shape}")
print(f"Output shape: {y.shape}")

`VisionEmbeddings`

The VisionEmbedding class is designed for converting images into patch embeddings, making them suitable for processing by transformer-based models. This class plays a crucial role in various computer vision tasks and enables the integration of vision data into transformer architectures!

from zeta.nn import VisionEmbedding
import torch

# Create an instance of VisionEmbedding
vision_embedding = VisionEmbedding(
  img_size=224,
  patch_size=16,
  in_chans=3,
  embed_dim=768,
  contain_mask_token=True,
  prepend_cls_token=True,
)

# Load an example image (3 channels, 224x224)
input_image = torch.rand(1, 3, 224, 224)

# Perform image-to-patch embedding
output = vision_embedding(input_image)

# The output now contains patch embeddings, ready for input to a transformer model

Documentation

Click here for the documentation, it's at zeta.apac.ai

Contributing

We need you to help us build the most re-useable, reliable, and high performance ML framework ever.
Check out the project board here!
We need help writing tests and documentation!

License

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.5.5

May 27, 2024

2.5.4

May 27, 2024

2.5.3

May 27, 2024

2.5.1

May 25, 2024

2.4.8

May 13, 2024

2.4.6

May 12, 2024

2.4.5

May 1, 2024

2.4.4

May 1, 2024

2.4.3

Apr 30, 2024

2.4.2

Apr 15, 2024

2.4.1

Apr 15, 2024

2.4.0

Apr 15, 2024

2.3.9

Apr 6, 2024

2.3.6

Apr 6, 2024

2.3.5

Apr 6, 2024

2.3.1

Apr 4, 2024

2.3.0

Apr 4, 2024

2.2.9

Apr 4, 2024

2.2.8

Apr 4, 2024

2.2.7

Apr 1, 2024

2.2.6

Mar 21, 2024

2.2.5

Mar 20, 2024

2.2.4

Mar 19, 2024

2.2.3

Mar 19, 2024

2.2.1

Mar 2, 2024

2.1.9

Feb 29, 2024

2.1.8

Feb 29, 2024

2.1.7

Feb 23, 2024

2.1.6

Feb 19, 2024

2.1.3

Feb 16, 2024

2.1.2

Feb 16, 2024

2.1.1

Feb 12, 2024

2.1.0

Feb 10, 2024

2.0.8

Feb 9, 2024

2.0.7

Feb 5, 2024

2.0.6

Feb 1, 2024

2.0.5

Feb 1, 2024

2.0.3

Jan 30, 2024

2.0.2

Jan 30, 2024

2.0.0

Jan 23, 2024

1.9.8

Jan 21, 2024

1.9.6

Jan 18, 2024

1.9.5

Jan 18, 2024

1.9.4

Jan 18, 2024

1.9.1

Jan 16, 2024

1.8.8

Jan 16, 2024

1.8.7

Jan 15, 2024

1.8.2

Jan 13, 2024

1.7.8

Jan 12, 2024

1.7.6

Jan 12, 2024

1.7.5

Jan 12, 2024

1.7.1

Jan 11, 2024

1.7.0

Jan 11, 2024

1.6.7

Jan 10, 2024

1.6.6

Jan 10, 2024

1.6.5

Jan 10, 2024

1.6.3

Jan 9, 2024

1.6.1

Jan 9, 2024

1.6.0

Jan 9, 2024

1.5.8

Jan 8, 2024

1.5.7

Jan 8, 2024

1.5.4

Jan 8, 2024

1.4.9

Jan 7, 2024

1.4.8

Jan 7, 2024

1.4.6

Jan 7, 2024

1.4.5

Jan 7, 2024

1.4.4

Jan 6, 2024

1.4.3

Jan 5, 2024

1.4.1

Jan 4, 2024

1.4.0

Jan 4, 2024

1.3.8

Jan 1, 2024

1.3.7

Dec 29, 2023

1.3.6

Dec 29, 2023

1.3.4

Dec 28, 2023

1.3.1

Dec 27, 2023

1.3.0

Dec 27, 2023

1.2.9

Dec 27, 2023

1.2.5

Dec 24, 2023

1.2.4

Dec 23, 2023

1.2.3

Dec 23, 2023

1.2.2

Dec 23, 2023

1.2.1

Dec 21, 2023

1.2.0

Dec 21, 2023

1.1.9

Dec 21, 2023

1.1.7

Dec 20, 2023

1.1.6

Dec 20, 2023

1.1.5

Dec 20, 2023

1.1.4

Dec 20, 2023

1.1.3

Dec 20, 2023

1.1.2

Dec 20, 2023

1.1.1

Dec 20, 2023

1.0.9

Dec 20, 2023

1.0.8

Dec 20, 2023

1.0.7

Dec 20, 2023

1.0.6

Dec 20, 2023

1.0.0

Dec 20, 2023

0.9.9

Dec 17, 2023

0.9.8

Dec 17, 2023

0.9.5

Dec 17, 2023

0.9.4

Dec 16, 2023

0.9.3

Dec 15, 2023

0.9.2

Dec 14, 2023

0.9.1

Dec 12, 2023

0.9.0

Dec 12, 2023

0.8.9

Nov 30, 2023

0.8.8

Nov 30, 2023

0.8.7

Nov 30, 2023

This version

0.8.6

Nov 20, 2023

0.8.5

Nov 13, 2023

0.8.4

Nov 4, 2023

0.8.3

Oct 27, 2023

0.8.2

Oct 27, 2023

0.8.1

Oct 27, 2023

0.8.0

Oct 27, 2023

0.7.9

Oct 27, 2023

0.7.8

Oct 25, 2023

0.7.7

Oct 24, 2023

0.7.5

Oct 17, 2023

0.7.4

Oct 17, 2023

0.7.3

Oct 17, 2023

0.7.2

Oct 17, 2023

0.7.1

Oct 16, 2023

0.7.0

Oct 2, 2023

0.6.9

Oct 2, 2023

0.6.8

Oct 2, 2023

0.6.7

Oct 2, 2023

0.6.6

Oct 2, 2023

0.6.5

Sep 28, 2023

0.6.4

Sep 28, 2023

0.6.3

Sep 24, 2023

0.6.2

Sep 24, 2023

0.6.1

Sep 23, 2023

0.6.0

Sep 23, 2023

0.5.9

Sep 20, 2023

0.5.8

Sep 20, 2023

0.5.6

Sep 8, 2023

0.5.5

Sep 8, 2023

0.5.4

Sep 8, 2023

0.5.3

Sep 6, 2023

0.5.2

Sep 6, 2023

0.5.1

Sep 6, 2023

0.5.0

Sep 6, 2023

0.4.9

Sep 5, 2023

0.4.8

Sep 5, 2023

0.4.7

Sep 4, 2023

0.4.4

Aug 29, 2023

0.4.3

Aug 29, 2023

0.4.2

Aug 28, 2023

0.4.1

Aug 28, 2023

0.3.9

Aug 28, 2023

0.3.8

Aug 26, 2023

0.3.7

Aug 26, 2023

0.3.6

Aug 25, 2023

0.3.5

Aug 25, 2023

0.3.4

Aug 25, 2023

0.3.3

Aug 25, 2023

0.3.2

Aug 25, 2023

0.3.1

Aug 23, 2023

0.3.0

Aug 23, 2023

0.2.9

Aug 23, 2023

0.2.7

Aug 23, 2023

0.2.6

Aug 23, 2023

0.2.5

Aug 23, 2023

0.2.4

Aug 23, 2023

0.2.3

Aug 23, 2023

0.2.2

Aug 23, 2023

0.2.1

Aug 23, 2023

0.2.0

Jul 10, 2023

0.0.9

Aug 23, 2023

0.0.8

Aug 22, 2023

0.0.7

Aug 22, 2023

0.0.6

Aug 22, 2023

0.0.5

Aug 22, 2023

0.0.4

Aug 22, 2023

0.0.2

Jul 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zetascale-0.8.6.tar.gz (239.0 kB view hashes)

Uploaded Nov 20, 2023 Source

Built Distribution

zetascale-0.8.6-py3-none-any.whl (309.5 kB view hashes)

Uploaded Nov 20, 2023 Python 3

Hashes for zetascale-0.8.6.tar.gz

Hashes for zetascale-0.8.6.tar.gz
Algorithm	Hash digest
SHA256	`f6f573198b5c6759226314c2998cbf6c68661366785c06a3c887167385824b03`
MD5	`28b3cda9ec59f73d59aff1600261b38d`
BLAKE2b-256	`b39d79675bc732015cdf486bbec3fd15788072072dfee1d3ba9fc4b263b1c90a`

Hashes for zetascale-0.8.6-py3-none-any.whl

Hashes for zetascale-0.8.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5f025b7c00f05a3d088b7743509bcc7d0db9920573e6b4e9c60b6f81bf3d723`
MD5	`28a260913a43ec9646e1100cab0c3e4c`
BLAKE2b-256	`7e348ba2422dbdad233556177f2ee776475a177e8a681193310e355b26f89fb2`