fastervit

FasterViT: Fast Vision Transformers with Hierarchical Attention

These details have not been verified by PyPI

Project links

Homepage

Project description

FasterViT: Fast Vision Transformers with Hierarchical Attention.

FasterViT achieves a new SOTA Pareto-front in terms of accuracy vs. image throughput without extra training data !

Note: Please use the latest NVIDIA TensorRT release to enjoy the benefits of optimized FasterViT ops.

Quick Start

We can import pre-trained FasterViT models with 1 line of code. First, FasterViT can be simply installed by:

pip install fastervit

A pretrained FasterViT model with default hyper-parameters can be created as in the following:

>>> from fastervit import create_model

# Define fastervit-0 model with 224 x 224 resolution

>>> model = create_model('faster_vit_0_224', 
                          pretrained=True,
                          model_path="/tmp/faster_vit_0.pth.tar")

model_path is used to set the directory to download the model.

We can also simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 224, 224)
>>> output = model(image) # torch.Size([1, 1000])

We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0 model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of 64:

>>> from fastervit import create_model

# Define any-resolution FasterViT-0 model with 576 x 960 resolution
>>> model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

Note that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.)

We can simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 576, 960)
>>> output = model(image) # torch.Size([1, 1000])

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

Name	Acc@1(%)	Acc@5(%)	Throughput(Img/Sec)	Resolution	#Params(M)	FLOPs(G)	Download
FasterViT-0	82.1	95.9	5802	224x224	31.4	3.3	model
FasterViT-1	83.2	96.5	4188	224x224	53.4	5.3	model
FasterViT-2	84.2	96.8	3161	224x224	75.9	8.7	model
FasterViT-3	84.9	97.2	1780	224x224	159.5	18.2	model
FasterViT-4	85.4	97.3	849	224x224	424.6	36.6	model
FasterViT-5	85.6	97.4	449	224x224	975.5	113.0	model
FasterViT-6	85.8	97.4	352	224x224	1360.0	142.0	model

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

Name	A-Acc@1(%)	A-Acc@5(%)	R-Acc@1(%)	R-Acc@5(%)	V2-Acc@1(%)	V2-Acc@5(%)
FasterViT-0	23.9	57.6	45.9	60.4	70.9	90.0
FasterViT-1	31.2	63.3	47.5	61.9	72.6	91.0
FasterViT-2	38.2	68.9	49.6	63.4	73.7	91.6
FasterViT-3	44.2	73.0	51.9	65.6	75.0	92.2
FasterViT-4	49.0	75.4	56.0	69.6	75.7	92.7
FasterViT-5	52.7	77.6	56.9	70.0	76.0	93.0
FasterViT-6	53.7	78.4	57.1	70.1	76.1	93.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Citation

Please consider citing FasterViT if this repository is useful for your work.

@article{hatamizadeh2023fastervit,
  title={FasterViT: Fast Vision Transformers with Hierarchical Attention},
  author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},
  journal={arXiv preprint arXiv:2306.06189},
  year={2023}
}

Licenses

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.0

Jul 22, 2025

0.9.8

Sep 1, 2023

0.9.7

Aug 28, 2023

0.9.6

Aug 21, 2023

This version

0.9.5

Aug 12, 2023

0.9.4

Aug 7, 2023

0.9.3

Jul 20, 2023

0.9.2

Jul 7, 2023

0.9.1

Jul 5, 2023

0.9.0

Jul 1, 2023

0.8.9

Jun 23, 2023

0.8.8

Jun 21, 2023

0.8.7

Jun 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastervit-0.9.5.tar.gz (153.6 kB view details)

Uploaded Aug 12, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastervit-0.9.5-py3-none-any.whl (157.8 kB view details)

Uploaded Aug 12, 2023 Python 3

File details

Details for the file fastervit-0.9.5.tar.gz.

File metadata

Download URL: fastervit-0.9.5.tar.gz
Upload date: Aug 12, 2023
Size: 153.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.9.5.tar.gz
Algorithm	Hash digest
SHA256	`d0de13959dd0d4d1f04f37aed1dcc1702d640b2fad8c3125e5771a62fa99422f`
MD5	`65233382c9379c15268b83e81ea9f8c8`
BLAKE2b-256	`5d1a88348362060d5eb1c722ae73492337a59a6f8d4c9e39f403945cab14fdca`

See more details on using hashes here.

File details

Details for the file fastervit-0.9.5-py3-none-any.whl.

File metadata

Download URL: fastervit-0.9.5-py3-none-any.whl
Upload date: Aug 12, 2023
Size: 157.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.10

File hashes

Hashes for fastervit-0.9.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2a62722b61ea73f22ae0c1c6bf656aecc622885d8e326d04ecede72daaa3f4d`
MD5	`e235db2a6520d72ddf0e28ff8b8292f4`
BLAKE2b-256	`b5622e159d0532c4f31dcf6f4c1ea2c4ad740cf80c2ec48712a560bfebb9401b`

See more details on using hashes here.

fastervit 0.9.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick Start

Results + Pretrained Models

ImageNet-1K

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

Citation

Licenses

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes