Skip to main content

FasterViT: Fast Vision Transformers with Hierarchical Attention

Project description

FasterViT: Fast Vision Transformers with Hierarchical Attention

FasterViT achieves a new SOTA Pareto-front in terms of accuracy vs. image throughput without extra training data !

Note: Please use the latest NVIDIA TensorRT release to enjoy the benefits of optimized FasterViT ops.

Quick Start

We can import pre-trained FasterViT models with 1 line of code. First, FasterViT can be simply installed by:

pip install fastervit

A pretrained FasterViT model with default hyper-parameters can be created as in the following:

>>> from fastervit import create_model

# Define fastervit-0 model with 224 x 224 resolution

>>> model = create_model('faster_vit_0_224', 
                          pretrained=True,
                          model_path="/tmp/faster_vit_0.pth.tar")

model_path is used to set the directory to download the model.

We can also simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 224, 224)
>>> output = model(image) # torch.Size([1, 1000])

We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0 model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of 64:

>>> from fastervit import create_model

# Define any-resolution FasterViT-0 model with 576 x 960 resolution
>>> model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

Note that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.)

We can simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 576, 960)
>>> output = model(image) # torch.Size([1, 1000])

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

Name Acc@1(%) Acc@5(%) Throughput(Img/Sec) Resolution #Params(M) FLOPs(G) Download
FasterViT-0 82.1 95.9 5802 224x224 31.4 3.3 model
FasterViT-1 83.2 96.5 4188 224x224 53.4 5.3 model
FasterViT-2 84.2 96.8 3161 224x224 75.9 8.7 model
FasterViT-3 84.9 97.2 1780 224x224 159.5 18.2 model
FasterViT-4 85.4 97.3 849 224x224 424.6 36.6 model
FasterViT-5 85.6 97.4 449 224x224 975.5 113.0 model
FasterViT-6 85.8 97.4 352 224x224 1360.0 142.0 model

ImageNet-21K

FasterViT ImageNet-21K Pretrained Models (ImageNet-1K Fine-tuned)

Name Acc@1(%) Acc@5(%) Resolution #Params(M) FLOPs(G) Download
FasterViT-4-21K-224 86.6 97.8 224x224 271.9 40.8 model
FasterViT-4-21K-384 87.6 98.3 384x384 271.9 120.1 model
FasterViT-4-21K-512 87.8 98.4 512x512 271.9 213.5 model
FasterViT-4-21K-768 87.9 98.5 768x768 271.9 480.4 model

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

Name A-Acc@1(%) A-Acc@5(%) R-Acc@1(%) R-Acc@5(%) V2-Acc@1(%) V2-Acc@5(%)
FasterViT-0 23.9 57.6 45.9 60.4 70.9 90.0
FasterViT-1 31.2 63.3 47.5 61.9 72.6 91.0
FasterViT-2 38.2 68.9 49.6 63.4 73.7 91.6
FasterViT-3 44.2 73.0 51.9 65.6 75.0 92.2
FasterViT-4 49.0 75.4 56.0 69.6 75.7 92.7
FasterViT-5 52.7 77.6 56.9 70.0 76.0 93.0
FasterViT-6 53.7 78.4 57.1 70.1 76.1 93.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Citation

Please consider citing FasterViT if this repository is useful for your work.

@article{hatamizadeh2023fastervit,
  title={FasterViT: Fast Vision Transformers with Hierarchical Attention},
  author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},
  journal={arXiv preprint arXiv:2306.06189},
  year={2023}
}

Licenses

Copyright © 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastervit-0.9.6.tar.gz (156.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastervit-0.9.6-py3-none-any.whl (165.7 kB view details)

Uploaded Python 3

File details

Details for the file fastervit-0.9.6.tar.gz.

File metadata

  • Download URL: fastervit-0.9.6.tar.gz
  • Upload date:
  • Size: 156.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.9.6.tar.gz
Algorithm Hash digest
SHA256 d41983720fd481a4c880127aad0416060a1c22bfac15f9bf230048a76b81175e
MD5 a58a1c4dd0c9da43a67fdb3611c2e6e0
BLAKE2b-256 2493b2d0070baf4517e2026974a33a37d9de3d94f6cc1a6e58db135bab0eb27e

See more details on using hashes here.

File details

Details for the file fastervit-0.9.6-py3-none-any.whl.

File metadata

  • Download URL: fastervit-0.9.6-py3-none-any.whl
  • Upload date:
  • Size: 165.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.9.6-py3-none-any.whl
Algorithm Hash digest
SHA256 42ab9e345f3c94ae0b2bee86120aff49521dfad5b57d6233206acaa6b0260d3b
MD5 99722973b0cba95ac4527c8799f43c38
BLAKE2b-256 7bce227494443d97c660c464f34b85938e8704fd4fc61a8d7d68a672c215fdce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page