Skip to main content

Spatial Shift Vision Transformer

Project description

Spatial Shift ViT

S²-ViT is a hierarchical vision transformer with shifted window attention. In contrast to Swin, the shift operation used is based on S²-MLP, which shifts in all four directions simultaneously, and does not use the roll or unroll operation. Additionally, it leverages the patch embedding and positional encoding methods from Twins-SVT, and StarReLU from MetaFormer.

Prerequisites

  • Python 3.10+
  • PyTorch 2.0+

Installation

pip install s2vit

Usage

import torch
from s2vit import S2ViT

vit = S2ViT(
    depths=(2, 2, 6, 2),
    dims=(64, 128, 160, 320),
    global_pool=True
    num_classes=1000,
)

img = torch.randn(1, 3, 256, 256)
vit(img) # (1, 1000)

Acknowledgements

lucidrains for his excellent work, including vit-pytorch, x-transformers, and his discovery of shared key / value attention.

Citations

@article{Yu2021S2MLPSM,
  title={S2-MLP: Spatial-Shift MLP Architecture for Vision},
  author={Tan Yu and Xu Li and Yunfeng Cai and Mingming Sun and Ping Li},
  journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2021},
  pages={3615-3624},
  url={https://api.semanticscholar.org/CorpusID:235422259}
}
@article{Liu2021SwinTH,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={9992-10002},
  url={https://api.semanticscholar.org/CorpusID:232352874}
}
@article{Liu2021SwinTV,
  title={Swin Transformer V2: Scaling Up Capacity and Resolution},
  author={Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={11999-12009},
  url={https://api.semanticscholar.org/CorpusID:244346076}
}
@inproceedings{Chu2021TwinsRT,
  title={Twins: Revisiting the Design of Spatial Attention in Vision Transformers},
  author={Xiangxiang Chu and Zhi Tian and Yuqing Wang and Bo Zhang and Haibing Ren and Xiaolin Wei and Huaxia Xia and Chunhua Shen},
  booktitle={Neural Information Processing Systems},
  year={2021},
  url={https://api.semanticscholar.org/CorpusID:234364557}
}
@article{Yu2022MetaFormerBF,
  title={MetaFormer Baselines for Vision},
  author={Weihao Yu and Chenyang Si and Pan Zhou and Mi Luo and Yichen Zhou and Jiashi Feng and Shuicheng Yan and Xinchao Wang},
  journal={ArXiv},
  year={2022},
  volume={abs/2210.13452},
  url={https://api.semanticscholar.org/CorpusID:253098429}
}
@article{Touvron2022ThreeTE,
  title={Three things everyone should know about Vision Transformers},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herv'e J'egou},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.09795},
  url={https://api.semanticscholar.org/CorpusID:247594673}
}
@article{Chowdhery2022PaLMSL,
  title={PaLM: Scaling Language Modeling with Pathways},
  author={Aakanksha Chowdhery and Sharan Narang and Jacob Devlin and Maarten Bosma and Gaurav Mishra and Adam Roberts and Paul Barham and Hyung Won Chung and Charles Sutton and Sebastian Gehrmann and Parker Schuh and Kensen Shi and Sasha Tsvyashchenko and Joshua Maynez and Abhishek Rao and Parker Barnes and Yi Tay and Noam M. Shazeer and Vinodkumar Prabhakaran and Emily Reif and Nan Du and Benton C. Hutchinson and Reiner Pope and James Bradbury and Jacob Austin and Michael Isard and Guy Gur-Ari and Pengcheng Yin and Toju Duke and Anselm Levskaya and Sanjay Ghemawat and Sunipa Dev and Henryk Michalewski and Xavier Garc{\'i}a and Vedant Misra and Kevin Robinson and Liam Fedus and Denny Zhou and Daphne Ippolito and David Luan and Hyeontaek Lim and Barret Zoph and Alexander Spiridonov and Ryan Sepassi and David Dohan and Shivani Agrawal and Mark Omernick and Andrew M. Dai and Thanumalayan Sankaranarayana Pillai and Marie Pellat and Aitor Lewkowycz and Erica Moreira and Rewon Child and Oleksandr Polozov and Katherine Lee and Zongwei Zhou and Xuezhi Wang and Brennan Saeta and Mark D{\'i}az and Orhan Firat and Michele Catasta and Jason Wei and Kathleen S. Meier-Hellstern and Douglas Eck and Jeff Dean and Slav Petrov and Noah Fiedel},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.02311},
  url={https://api.semanticscholar.org/CorpusID:247951931}
}
@article{Bondarenko2023QuantizableTR,
  title={Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},
  author={Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},
  journal={ArXiv},
  year={2023},
  volume={abs/2306.12929},
  url={https://api.semanticscholar.org/CorpusID:259224568}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s2vit-0.5.1.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

s2vit-0.5.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file s2vit-0.5.1.tar.gz.

File metadata

  • Download URL: s2vit-0.5.1.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for s2vit-0.5.1.tar.gz
Algorithm Hash digest
SHA256 f2de060a28a42d1a59074155f200f5d671ac20a8ee0d8cab9e8b30ff4f10366d
MD5 81ad538271201add1e8d06c9042e3d5a
BLAKE2b-256 2547e94c7440cbb1667a8edc314ac92ecca956bc33b5fddb8448c394ea347542

See more details on using hashes here.

File details

Details for the file s2vit-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: s2vit-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for s2vit-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d860c1e36a0a03bea0f5c7af01b32ffe783c02cb1e9adcad064c06238efec9f
MD5 c9621efe6476249eaf31184d48e628c8
BLAKE2b-256 ad5471ca023b74651a0fd1bf9d0656696d79b2ca7eaef93f955913786d7a841c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page