Skip to main content

FasterViT: Fast Vision Transformers with Hierarchical Attention

Project description

FasterViT: Fast Vision Transformers with Hierarchical Attention

Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention.

Ali Hatamizadeh, Greg Heinrich, Hongxu (Danny) Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing


FasterViT achieves a new SOTA Pareto-front in terms of accuracy vs. image throughput (no extra training data !)

💥 News 💥

  • [06.18.2023] 🔥 We have released the FasterViT pip package !
  • [06.17.2023] 🔥 Any-resolution FasterViT model is now available ! the model can be used for variety of applications such as detection and segmentation or high-resolution fine-tuning with arbitrary input image resolutions.
  • [06.09.2023] 🔥🔥 We have released source code and ImageNet-1K FasterViT-models !

Quick Start

The FasterViT can be conveniently installed by:

pip install fastervit

A FasterViT model with default hyper-parameters can be created as in the following:

>>> from fastervit import create_model

# Define fastervit-0 model with 224 x 224 resolution
>>> model = create_model('faster_vit_0_224')

We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-1 model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of 128:

>>> from fastervit import create_model

# Define any-resolution FasterViT-1 model with 576 x 960 resolution
>>> model = create_model('faster_vit_1_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=128)

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

Name Acc@1(%) Acc@5(%) Throughput(Img/Sec) Resolution #Params(M) FLOPs(G) Download
FasterViT-0 82.1 95.9 5802 224x224 31.4 3.3 model
FasterViT-1 83.2 96.5 4188 224x224 53.4 5.3 model
FasterViT-2 84.2 96.8 3161 224x224 75.9 8.7 model
FasterViT-3 84.9 97.2 1780 224x224 159.5 18.2 model
FasterViT-4 85.4 97.3 849 224x224 424.6 36.6 model
FasterViT-5 85.6 97.4 449 224x224 975.5 113.0 model
FasterViT-6 85.8 97.4 352 224x224 1360.0 142.0 model

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

Name A-Acc@1(%) A-Acc@5(%) R-Acc@1(%) R-Acc@5(%) V2-Acc@1(%) V2-Acc@5(%)
FasterViT-0 23.9 57.6 45.9 60.4 70.9 90.0
FasterViT-1 31.2 63.3 47.5 61.9 72.6 91.0
FasterViT-2 38.2 68.9 49.6 63.4 73.7 91.6
FasterViT-3 44.2 73.0 51.9 65.6 75.0 92.2
FasterViT-4 49.0 75.4 56.0 69.6 75.7 92.7
FasterViT-5 52.7 77.6 56.9 70.0 76.0 93.0
FasterViT-6 53.7 78.4 57.1 70.1 76.1 93.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Training

Please see TRAINING.md for detailed training instructions of all models.

Evaluation

The FasterViT models can be evaluated on ImageNet-1K validation set using the following:

python validate.py \
--model <model-name>
--checkpoint <checkpoint-path>
--data_dir <imagenet-path>
--batch-size <batch-size-per-gpu

Here --model is the FasterViT variant (e.g. faster_vit_0_224_1k), --checkpoint is the path to pretrained model weights, --data_dir is the path to ImageNet-1K validation set and --batch-size is the number of batch size. We also provide a sample script here.

Installation

The dependencies can be installed by running:

pip install -r requirements.txt

Third-party Extentions

We always welcome third-party extentions/implementations and usage for other purposes. If you would like your work to be listed in this repository, please raise and issue and provide us with detailed information.

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

Licenses

Copyright © 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastervit-0.8.7.tar.gz (149.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastervit-0.8.7-py3-none-any.whl (152.9 kB view details)

Uploaded Python 3

File details

Details for the file fastervit-0.8.7.tar.gz.

File metadata

  • Download URL: fastervit-0.8.7.tar.gz
  • Upload date:
  • Size: 149.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.8.7.tar.gz
Algorithm Hash digest
SHA256 04ec6864f593ab0fb7a1c7997e2e0936c9db5c54346e857e127f22c20331f3fb
MD5 b36817cb455ab6bcd10f071c19546ba5
BLAKE2b-256 a29c587371db2c8f7e5b68723e712173daa9d4c40eaa0b8b08d3134b8fab8667

See more details on using hashes here.

File details

Details for the file fastervit-0.8.7-py3-none-any.whl.

File metadata

  • Download URL: fastervit-0.8.7-py3-none-any.whl
  • Upload date:
  • Size: 152.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.8.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f35b5ea93825148f550993bac1c7288d6a2d7c27854d06fb5861deda1dc3b4ba
MD5 fe51ba5f276723e17684b70eb53a1a8a
BLAKE2b-256 9f1f5442f61d949d1598738774627796a17f3c16b86cb505a926ed8cf92a6f14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page