perceiver-model

Multimodal Perceiver - Pytorch

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Perceiver - Pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch. Extended from Phil Wang's perceiver-pytorch

Yannic Kilcher explanation!

Install

$ pip install perceiver-model

Usage

import torch
from perceiver_pytorch import Perceiver

model = Perceiver(
    input_channels = 3,          # number of channels for each token of the input
    input_axis = 2,              # number of axis for input data (2 for images, 3 for video)
    num_freq_bands = 6,          # number of freq bands, with original value (2 * K + 1)
    max_freq = 10.,              # maximum frequency, hyperparameter depending on how fine the data is
    depth = 6,                   # depth of net. The shape of the final attention mechanism will be:
                                 #   depth * (cross attention -> self_per_cross_attn * self attention)
    num_latents = 256,           # number of latents, or induced set points, or centroids. different papers giving it different names
    latent_dim = 512,            # latent dimension
    cross_heads = 1,             # number of heads for cross attention. paper said 1
    latent_heads = 8,            # number of heads for latent self attention, 8
    cross_dim_head = 64,         # number of dimensions per cross attention head
    latent_dim_head = 64,        # number of dimensions per latent self attention head
    num_classes = 1000,          # output number of classes
    attn_dropout = 0.,
    ff_dropout = 0.,
    weight_tie_layers = False,   # whether to weight tie layers (optional, as indicated in the diagram)
    fourier_encode_data = True,  # whether to auto-fourier encode the data, using the input_axis given. defaults to True, but can be turned off if you are fourier encoding the data yourself
    self_per_cross_attn = 2      # number of self attention blocks per cross attention
)

img = torch.randn(1, 224, 224, 3) # 1 imagenet image, pixelized

model(img) # (1, 1000)

For the backbone of Perceiver IO, the follow up paper that allows for flexible number of output sequence length, just import PerceiverIO instead

import torch
from perceiver_pytorch import PerceiverIO

model = PerceiverIO(
    dim = 32,                    # dimension of sequence to be encoded
    queries_dim = 32,            # dimension of decoder queries
    logits_dim = 100,            # dimension of final logits
    depth = 6,                   # depth of net
    num_latents = 256,           # number of latents, or induced set points, or centroids. different papers giving it different names
    latent_dim = 512,            # latent dimension
    cross_heads = 1,             # number of heads for cross attention. paper said 1
    latent_heads = 8,            # number of heads for latent self attention, 8
    cross_dim_head = 64,         # number of dimensions per cross attention head
    latent_dim_head = 64,        # number of dimensions per latent self attention head
    weight_tie_layers = False    # whether to weight tie layers (optional, as indicated in the diagram)
)

seq = torch.randn(1, 512, 32)
queries = torch.randn(1, 128, 32)

logits = model(seq, queries = queries) # (1, 128, 100) - (batch, decoder seq, logits dim)

Citations

@misc{jaegle2021perceiver,
    title   = {Perceiver: General Perception with Iterative Attention},
    author  = {Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira},
    year    = {2021},
    eprint  = {2103.03206},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

@misc{jaegle2021perceiver,
    title   = {Perceiver IO: A General Architecture for Structured Inputs & Outputs},
    author  = {Andrew Jaegle and Sebastian Borgeaud and Jean-Baptiste Alayrac and Carl Doersch and Catalin Ionescu and David Ding and Skanda Koppula and Andrew Brock and Evan Shelhamer and Olivier Hénaff and Matthew M. Botvinick and Andrew Zisserman and Oriol Vinyals and João Carreira},
    year    = {2021},
    eprint  = {2107.14795},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.7.6

Oct 29, 2021

0.7.5

Oct 26, 2021

This version

0.7.4

Oct 5, 2021

0.7.3

Oct 5, 2021

0.7.2

Sep 28, 2021

0.7.1

Sep 8, 2021

0.7.0

Sep 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perceiver-model-0.7.4.tar.gz (18.7 kB view hashes)

Uploaded Oct 5, 2021 Source

Built Distribution

perceiver_model-0.7.4-py3-none-any.whl (25.7 kB view hashes)

Uploaded Oct 5, 2021 Python 3

Hashes for perceiver-model-0.7.4.tar.gz

Hashes for perceiver-model-0.7.4.tar.gz
Algorithm	Hash digest
SHA256	`4d9477da223cd1b65795cbc96091bec06aab3834b3ff06835a9b8cfa43fcd751`
MD5	`1aca9a8aa34a2f5dcae1ec3682339ba7`
BLAKE2b-256	`9fd5454ef555bc56d701ca6abc4b7c92048abe50917a7c7ec9962bd21169aff3`

Hashes for perceiver_model-0.7.4-py3-none-any.whl

Hashes for perceiver_model-0.7.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1bd4c8d715c243afcbf9a45a2c0c0b7ac87050998bb3a49fba8efa4ab04bb90a`
MD5	`d9d86fffc0d29eee03466dc2ad8a2f2a`
BLAKE2b-256	`7bb5bce1fd973972508e802df1be855fbcf26c8272cdc079ae0a7c4ce5dc1dd2`