OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts
Project description
OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts
OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.
Key Features
- Flexible Resolution Processing: Handles arbitrary input image sizes through adaptive patch embedding
- Multi-Query Attention (MQA): Reduces computational complexity while maintaining model expressiveness
- Rotary Embeddings: Enables better modeling of relative positions and spatial relationships
- State Space Models (SSM): Integrates efficient sequence modeling every third layer
- Mixture of Experts (MoE): Implements conditional computation for enhanced model capacity
- Comprehensive Logging: Built-in loguru integration for detailed execution tracking
- Shape-Aware Design: Continuous tensor shape tracking for reliable processing
Architecture
flowchart TB
subgraph Input
img[Input Image]
end
subgraph PatchEmbed[Flexible Patch Embedding]
conv[Convolution]
norm1[LayerNorm]
conv --> norm1
end
subgraph TransformerBlocks[Transformer Blocks x12]
subgraph Block1[Block n]
direction TB
mqa[Multi-Query Attention]
ln1[LayerNorm]
moe1[Mixture of Experts]
ln2[LayerNorm]
ln1 --> mqa --> ln2 --> moe1
end
subgraph Block2[Block n+1]
direction TB
mqa2[Multi-Query Attention]
ln3[LayerNorm]
moe2[Mixture of Experts]
ln4[LayerNorm]
ln3 --> mqa2 --> ln4 --> moe2
end
subgraph Block3[Block n+2 SSM]
direction TB
ssm[State Space Model]
ln5[LayerNorm]
moe3[Mixture of Experts]
ln6[LayerNorm]
ln5 --> ssm --> ln6 --> moe3
end
end
subgraph Output
gap[Global Average Pooling]
classifier[Classification Head]
end
img --> PatchEmbed --> TransformerBlocks --> gap --> classifier
Multi-Query Attention Detail
flowchart LR
input[Input Features]
subgraph MQA[Multi-Query Attention]
direction TB
q[Q Linear]
k[K Linear]
v[V Linear]
rotary[Rotary Embeddings]
attn[Attention Weights]
input --> q & k & v
q & k --> rotary
rotary --> attn
attn --> v
end
MQA --> output[Output Features]
Installation
pip install omegavit
Quick Start
import torch
from omegavit import create_advanced_vit
# Create model
model = create_advanced_vit(num_classes=1000)
# Example forward pass
batch_size = 8
x = torch.randn(batch_size, 3, 224, 224)
output = model(x)
print(f"Output shape: {output.shape}") # [8, 1000]
Model Configurations
| Parameter | Default | Description |
|---|---|---|
| hidden_size | 768 | Dimension of transformer layers |
| num_attention_heads | 12 | Number of attention heads |
| num_experts | 8 | Number of expert networks in MoE |
| expert_capacity | 32 | Tokens per expert in MoE |
| num_layers | 12 | Number of transformer blocks |
| patch_size | 16 | Size of image patches |
| ssm_state_size | 16 | Hidden state size in SSM |
Performance
Note: Benchmarks coming soon
Citation
If you use OmegaViT in your research, please cite:
@article{omegavit2024,
title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},
author={Agora Lab},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}
Contributing
We welcome contributions! Please see our contributing guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Special thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omegavit-0.0.1.tar.gz.
File metadata
- Download URL: omegavit-0.0.1.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5811951b2a6aecce212ca25d38fd81915b6d02a016558e5e158a2b04192d4200
|
|
| MD5 |
918266b67748cb10b04961a4d91aa8a3
|
|
| BLAKE2b-256 |
294f81ae6aa58a819193274f3c04044e51fce2fd324e5d7815498f98d7db3409
|
File details
Details for the file omegavit-0.0.1-py3-none-any.whl.
File metadata
- Download URL: omegavit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74383eae3494a5e64df26cb664657b9905efd7df33bd85f21e7c62c681d75191
|
|
| MD5 |
59245bb3295bba1b956f2d296a823122
|
|
| BLAKE2b-256 |
86dfa86a12ef426a99e0881bc36ffd91faa09307213bcde358188ed6a23c9071
|