OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts

These details have not been verified by PyPI

Project links

Project description

OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts

OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.

Key Features

Flexible Resolution Processing: Handles arbitrary input image sizes through adaptive patch embedding
Multi-Query Attention (MQA): Reduces computational complexity while maintaining model expressiveness
Rotary Embeddings: Enables better modeling of relative positions and spatial relationships
State Space Models (SSM): Integrates efficient sequence modeling every third layer
Mixture of Experts (MoE): Implements conditional computation for enhanced model capacity
Comprehensive Logging: Built-in loguru integration for detailed execution tracking
Shape-Aware Design: Continuous tensor shape tracking for reliable processing

Architecture

flowchart TB
    subgraph Input
        img[Input Image]
    end
    
    subgraph PatchEmbed[Flexible Patch Embedding]
        conv[Convolution]
        norm1[LayerNorm]
        conv --> norm1
    end
    
    subgraph TransformerBlocks[Transformer Blocks x12]
        subgraph Block1[Block n]
            direction TB
            mqa[Multi-Query Attention]
            ln1[LayerNorm]
            moe1[Mixture of Experts]
            ln2[LayerNorm]
            ln1 --> mqa --> ln2 --> moe1
        end
        
        subgraph Block2[Block n+1]
            direction TB
            mqa2[Multi-Query Attention]
            ln3[LayerNorm]
            moe2[Mixture of Experts]
            ln4[LayerNorm]
            ln3 --> mqa2 --> ln4 --> moe2
        end
        
        subgraph Block3[Block n+2 SSM]
            direction TB
            ssm[State Space Model]
            ln5[LayerNorm]
            moe3[Mixture of Experts]
            ln6[LayerNorm]
            ln5 --> ssm --> ln6 --> moe3
        end
    end
    
    subgraph Output
        gap[Global Average Pooling]
        classifier[Classification Head]
    end
    
    img --> PatchEmbed --> TransformerBlocks --> gap --> classifier

Multi-Query Attention Detail

flowchart LR
    input[Input Features]
    
    subgraph MQA[Multi-Query Attention]
        direction TB
        q[Q Linear]
        k[K Linear]
        v[V Linear]
        rotary[Rotary Embeddings]
        attn[Attention Weights]
        
        input --> q & k & v
        q & k --> rotary
        rotary --> attn
        attn --> v
    end
    
    MQA --> output[Output Features]

Installation

pip install omegavit

Quick Start

import torch
from omegavit import create_advanced_vit

# Create model
model = create_advanced_vit(num_classes=1000)

# Example forward pass
batch_size = 8
x = torch.randn(batch_size, 3, 224, 224)
output = model(x)
print(f"Output shape: {output.shape}")  # [8, 1000]

Model Configurations

Parameter	Default	Description
hidden_size	768	Dimension of transformer layers
num_attention_heads	12	Number of attention heads
num_experts	8	Number of expert networks in MoE
expert_capacity	32	Tokens per expert in MoE
num_layers	12	Number of transformer blocks
patch_size	16	Size of image patches
ssm_state_size	16	Hidden state size in SSM

Performance

Note: Benchmarks coming soon

Citation

If you use OmegaViT in your research, please cite:

@article{omegavit2024,
  title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},
  author={Agora Lab},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

Contributing

We welcome contributions! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Special thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Dec 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omegavit-0.0.1.tar.gz (7.8 kB view details)

Uploaded Dec 19, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omegavit-0.0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Dec 19, 2024 Python 3

File details

Details for the file omegavit-0.0.1.tar.gz.

File metadata

Download URL: omegavit-0.0.1.tar.gz
Upload date: Dec 19, 2024
Size: 7.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0

File hashes

Hashes for omegavit-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5811951b2a6aecce212ca25d38fd81915b6d02a016558e5e158a2b04192d4200`
MD5	`918266b67748cb10b04961a4d91aa8a3`
BLAKE2b-256	`294f81ae6aa58a819193274f3c04044e51fce2fd324e5d7815498f98d7db3409`

See more details on using hashes here.

File details

Details for the file omegavit-0.0.1-py3-none-any.whl.

File metadata

Download URL: omegavit-0.0.1-py3-none-any.whl
Upload date: Dec 19, 2024
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0

File hashes

Hashes for omegavit-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74383eae3494a5e64df26cb664657b9905efd7df33bd85f21e7c62c681d75191`
MD5	`59245bb3295bba1b956f2d296a823122`
BLAKE2b-256	`86dfa86a12ef426a99e0881bc36ffd91faa09307213bcde358188ed6a23c9071`

See more details on using hashes here.

omegavit 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts

Key Features

Architecture

Multi-Query Attention Detail

Installation

Quick Start

Model Configurations

Performance

Citation

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes