Vision Transformer (ViT) - Pytorch
Project description
Pytorch Implementation of ViT
Original Paper link: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Alexey Dosovitskiy et al.)
Install
$ pip install vit-pytorch-implementation
#Usage:
import torch
from vit_pytorch import lilViT
v = lilViT(
img_size=224,
in_channels=3,
patch_size=16,
num_transformer_layers=12,
embedding_dim=768,
mlp_size=3072,
num_heads=12,
attn_dropout=0,
mlp_dropout=0.1,
embedding_dropout=0.1,
num_classes=1000
)
img = torch.randn(1, 3, 224, 224)
preds = v(img) # (1, 1000)
preds.shape
Parameters
img_size
: int.
Image resolution. Default=224(224x224)in_channels
: int.
Image channels. Default3
patch_size
: int.
Size of patches.image_size
must be divisible bypatch_size
.
The number of patches is:n = (image_size // patch_size) ** 2
andn
must be greater than 16. Default16
num_transformer_layers
: int.
Depth(number of transformer blocks). Default12
embedding_dim
: int.
Embedding dimension. Default768
mlp_size
: int.
MLP size. Default3072
num_heads
: int.
Number of heads in Multi-head Attention layer. Default12
attn_dropout
: float.
Dropout for attention projection. Default0
mlp_dropout
: float
Dropout for dense/MLP layers. Default0.1
embedding_dropout
: float.
Dropout for patch and position embeddings.Default0.1
num_classes
: int.
Number of classes to classify. Default1000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for vit-pytorch-implementation-1.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60cd8447dec9f445cadd634efc7bcdc0aedc40cee77784f4e2da8275c920cddd |
|
MD5 | b03a20aa63edbe07232c7cd77dc5e05b |
|
BLAKE2b-256 | cbf290f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe |
Close
Hashes for vit_pytorch_implementation-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a2c3ba6101229fbd19e4ca7d0b0de308a5d0b8cf89c6cfad6bdf4af662a82d8 |
|
MD5 | effe1b95176a18783642ef741492133f |
|
BLAKE2b-256 | 22c6ed1e04c5230d8f04afc66828e4aec0bcd0d29b9a4486b538a863dc29253a |