Vision Transformers (ViT)
Project description
vision_transformers
A repository for everything Vision Transformers.
Currently Supported Models
-
Image Classification
- ViT Base Patch 16 | 224x224: Torchvision pretrained weights
- ViT Base Patch 32 | 224x224: Torchvision pretrained weights
- ViT Tiny Patch 16 | 224x224: Timm pretrained weights
- Vit Tiny Patch 16 | 384x384: Timm pretrained weights
- Swin Transformer Tiny Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Small Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Base Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Large Patch 4 Window 7 | 224x224: No pretrained weights
Quick Setup
Stable PyPi Package
pip install vision-transformers
OR
Latest Git Updates
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
Installation in the environment of your choice:
pip install .
Importing Models and Usage
If you have you own training pipeline and just want the model
Replace num_classes=1000
with you own number of classes.
from vision_transformers.models import vit
model = vit.vit_b_p16_224(num_classes=1000, pretrained=True)
# model = vit.vit_b_p32_224(num_classes=1000, pretrained=True)
# model = vit.vit_ti_p16_224(num_classes=1000, pretrained=True)
from vision_transformers.models import swin_transformer
model = swin_transformer.swin_t_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_s_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_b_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_l_p4_w7_224(num_classes=1000)
If you want to use the training pipeline
- Clone the repository:
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
- Install
pip install .
From the vision_transformers
directory:
- If you have no validation split
python tools/train_classifier.py --data data/diabetic_retinopathy/colored_images/ 0.15 --epochs 5 --model vit_ti_p16_224
-
In the above command:
-
data/diabetic_retinopathy/colored_images/
represents the data folder where the images will be inside the respective class folders -
0.15
represents the validation split as the dataset does not contain a validation folder
-
-
If you have validation split
python tools/train_classifier.py --train-dir data/plant_disease_recognition/train/ --valid-dir data/plant_disease_recognition/valid/ --epochs 5 --model vit_ti_p16_224
- In the above command:
--train-dir
should be path to the training directory where the images will be inside their respective class folders.valid-dir
should be path to the validation directory where the images will be inside their respective class folders.
All Available Model Flags for --model
vit_b_p32_224
vit_ti_p16_224
vit_ti_p16_384
vit_b_p16_224
swin_b_p4_w7_224
swin_t_p4_w7_224
swin_s_p4_w7_224
swin_l_p4_w7_224
Examples
- ViT Base 16 | 224x224 pretrained fine-tuning on CIFAR10
- ViT Tiny 16 | 224x224 pretrained fine-tuning on CIFAR10
- DETR image inference notebook
- DETR video inference script (Fine Tuning Coming Soon) --- Check commands here
DETR Video Inference Commands
All commands to be executed from the root project directory (vision_transformers
)
- Using default video:
python examples/detr_video_inference.py
- Using CPU only:
python examples/detr_video_inference.py --device cpu
- Using another video file:
python examples/detr_video_inference.py --input /path/to/video/file
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for vision_transformers-0.1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22bc10d45bb7edc454a4e8b92bfed48936abb4cd65ecb6f74e9de2830b36d5e0 |
|
MD5 | 6299d355dfbdfe0b4064202ad34093bb |
|
BLAKE2b-256 | 4fb498853da4a0a6474918daa5a5dca9bba039c5c41b7f7b45faecda4ac1bdc8 |