A PyTorch library for vision transformer models
Project description
VFormer
A modular PyTorch library for Vision Transformers
Library Features
- Contains implementations of prominent ViT architectures broken down into modular components like encoder, attention mechanism, and decoder.
- Makes it easy to develop custom models by composing components of different architectures.
Installation
git clone https://github.com/SforAiDl/vformer.git
cd vformer/
python setup.py install
Models supported
- Vanilla ViT
- Swin Transformer
- Pyramid Vision Transformer
- CrossViT
- Compact Vision Transformer
- Compact Convolutional Transformer
- Visformer
- Vision Transformers for Dense Prediction
- CvT
- ConViT
- ViViT
Example usage
To instantiate and use a Swin Transformer model -
import torch
from vformer.models.classification import SwinTransformer
image = torch.randn(1, 3, 224, 224) # Example data
model = SwinTransformer(
img_size=224,
patch_size=4,
in_channels=3,
n_classes=10,
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
drop_rate=0.2,
)
logits = model(image)
VFormer
has a modular design and allows for easy experimentation using blocks/modules of different architectures. For example, if desired, you can use just the encoder or the windowed attention layer of the Swin Transformer model.
from vformer.attention import WindowAttention
window_attn = WindowAttention(
dim=128,
window_size=7,
num_heads=2,
**kwargs,
)
from vformer.encoder import SwinEncoder
swin_encoder = SwinEncoder(
dim=128,
input_resolution=(224, 224),
depth=2,
num_heads=2,
window_size=7,
**kwargs,
)
Please refer to our documentation to know more.
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vformer-0.1.0.tar.gz
(57.4 kB
view hashes)