Self-attention building blocks for computer vision applications in PyTorch

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Self-attention building blocks for computer vision applications in PyTorch

Implementation of self attention mechanisms for computer vision in PyTorch with einsum and einops

Focused on computer vision self-attention modules.

Ongoing repository. pip package coming soon...

Code Examples

Multi head attention

import torch
from self_attention_cv import MultiHeadSelfAttention

model = MultiHeadSelfAttention(dim=64)
x = torch.rand(16, 10, 64)  # [batch, tokens, dim]
mask = torch.zeros(10, 10)  # tokens X tokens
mask[5:8, 5:8] = 1
y = model(x, mask)

Axial attention

import torch
from self_attention_cv import AxialAttentionBlock
model = AxialAttentionBlock(in_channels=256, dim=64, heads=8)
x = torch.rand(1, 256, 64, 64)  # [batch, tokens, dim, dim]
y = model(x)

Vanilla Transformer Encoder

import torch
from self_attention_cv import TransformerEncoder

model = TransformerEncoder(dim=64,blocks=6,heads=8)
x = torch.rand(16, 10, 64)  # [batch, tokens, dim]
mask = torch.zeros(10, 10)  # tokens X tokens
mask[5:8, 5:8] = 1
y = model(x,mask)

1D Positional Embeddings

import torch
from self_attention_cv.pos_embeddings import AbsPosEmb1D,RelPosEmb1D

model = AbsPosEmb1D(tokens=20, dim_head=64)
# batch heads tokens dim_head
q = torch.rand(2, 3, 20, 64)
y1 = model(q)

model = RelPosEmb1D(tokens=20, dim_head=64, heads=3)
q = torch.rand(2, 3, 20, 64)
y2 = model(q)

2D Positional Embeddings

import torch
from self_attention_cv.pos_embeddings import RelPosEmb2D
dim = 32  # spatial dim of the feat map
model = RelPosEmb2D(
    feat_map_size=(dim, dim),
    dim_head=128)

q = torch.rand(2, 4, dim*dim, 128)
y = model(q)

Bottleneck Attention block

import torch
from self_attention_cv.bottleneck_transformer import BottleneckBlock
inp = torch.rand(1, 512, 32, 32)
bottleneck_block = BottleneckBlock(in_channels=512, fmap_size=(32, 32), heads=4, out_channels=1024, pooling=True)
y = bottleneck_block(inp)

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., & Chen, L. C. (2020, August). Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In European Conference on Computer Vision (pp. 108-126). Springer, Cham.
Srinivas, A., Lin, T. Y., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. (2021). Bottleneck Transformers for Visual Recognition. arXiv preprint arXiv:2101.11605.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.2.3

Jul 26, 2021

1.2.1

Jul 22, 2021

1.2.0

Jun 30, 2021

1.1.0

Mar 2, 2021

This version

1.0.0rc11 pre-release

Feb 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

self_attention_cv-1.0.0rc11.tar.gz (15.9 kB view hashes)

Uploaded Feb 15, 2021 Source

Built Distribution

self_attention_cv-1.0.0rc11-py3-none-any.whl (24.0 kB view hashes)

Uploaded Feb 15, 2021 Python 3

Hashes for self_attention_cv-1.0.0rc11.tar.gz

Hashes for self_attention_cv-1.0.0rc11.tar.gz
Algorithm	Hash digest
SHA256	`0d92e40a347eb857d14f01fbdd65230b425c54de923d5a325e14819aa58efc4f`
MD5	`2a181ca12d240ac303dd1388b049c21d`
BLAKE2b-256	`dcc028036a4810831dd3c84c73a3abc7302c4db9705fe2d830587c685cd3234d`

Hashes for self_attention_cv-1.0.0rc11-py3-none-any.whl

Hashes for self_attention_cv-1.0.0rc11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38505612746b431cf7310e05a7ff4212e1d819249f61a9817dbf504a17e91b4f`
MD5	`a7e7815bb492bd27253b07bdefb3d6fd`
BLAKE2b-256	`36ec3c0822215b6dcfad1af89328d6997e655e0b328b77209dbe6c249ed7bf47`