Perceiver IO

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Scientific/Engineering :: Artificial Intelligence

Project description

Perceiver IO

A PyTorch implementation of

This project supports training of Perceiver IO models with Pytorch Lightning. Training examples are given in section Tasks, inference examples in section Notebooks. Perceiver IO models are constructed with generic encoder and decoder classes and task-specific input and output adapters (see Model API). The command line interface is implemented with Lighting CLI.

Setup

conda env create -f environment.yml
conda activate perceiver-io
poetry install

Tasks

In the following subsections, Perceiver IO models are trained on a rather small scale. In particular, hyperparameters are set such that parallel training on two NVIDIA GTX 1080 GPUs (8 GB memory each) works quite well. I didn't really tune model architectures and other hyperparameters, so you'll probably get better results with a bit of experimentation. Support for more datasets and tasks will be added later.

Masked language modeling

Pretrain a Perceiver IO model on masked language modeling (MLM) with text from the IMDB training set. The pretrained encoder is then used for training a sentiment classification model. Predictions of masked tokens are logged to Tensorboard.

python perceiver/scripts/mlm.py fit \
  --model.num_latent_channels=64 \
  --model.encoder.num_layers=3 \
  --model.encoder.dropout=0.0 \
  --model.decoder.dropout=0.0 \
  --data=IMDBDataModule \
  --data.max_seq_len=512 \
  --data.batch_size=64 \
  --optimizer.lr=3e-3 \
  --optimizer.weight_decay=0.0 \
  --lr_scheduler.pct_start=0.1 \
  --trainer.accelerator=gpu \
  --trainer.devices=-1 \
  --trainer.max_steps=50000 \
  --trainer.check_val_every_n_epoch=5

For saving GPU memory and scaling model training, activation checkpointing can be enabled with --model.activation_checkpoint=true (disabled by default).

Sentiment classification

Train a classification decoder using a frozen encoder from masked language modeling. If you ran MLM yourself you'll need to modify the --model.mlm_ckpt argument accordingly, otherwise download checkpoints from here and extract them in the root directory of this project.

python perceiver/scripts/seq_clf.py fit \
  --model.mlm_ckpt='logs/mlm/version_0/checkpoints/epoch=254-val_loss=4.556.ckpt' \
  --model.num_latent_channels=64 \
  --model.encoder.num_layers=3 \
  --model.encoder.freeze=true \
  --model.encoder.dropout=0.0 \
  --model.decoder.dropout=0.0 \
  --data=IMDBDataModule \
  --data.max_seq_len=512 \
  --data.batch_size=128 \
  --optimizer.lr=1e-3 \
  --optimizer.weight_decay=0.01 \
  --trainer.accelerator=gpu \
  --trainer.devices=-1 \
  --trainer.max_epochs=30

Unfreeze the encoder and jointly fine-tune it together with the decoder that has been trained in the previous step. If you ran the previous step yourself you'll need to modify the --model.clf_ckpt argument accordingly, otherwise download checkpoints from here.

python perceiver/scripts/seq_clf.py fit \
  --model.clf_ckpt='logs/seq_clf/version_0/checkpoints/epoch=024-val_loss=0.352.ckpt' \
  --model.num_latent_channels=64 \
  --model.encoder.num_layers=3 \
  --model.encoder.dropout=0.1 \
  --model.decoder.dropout=0.1 \
  --data=IMDBDataModule \
  --data.max_seq_len=512 \
  --data.batch_size=128 \
  --optimizer.lr=1e-4 \
  --optimizer.weight_decay=0.01 \
  --trainer.accelerator=gpu \
  --trainer.devices=-1 \
  --trainer.max_epochs=30

Image classification

Classify MNIST images. See also Model API for details about the underlying Perceiver IO model.

python perceiver/scripts/img_clf.py fit \
  --model.num_latent_channels=128 \
  --model.encoder.num_layers=3 \
  --model.encoder.dropout=0.0 \
  --model.decoder.dropout=0.0 \
  --data=MNISTDataModule \
  --data.batch_size=128 \
  --optimizer.lr=1e-3 \
  --optimizer.weight_decay=0.01 \
  --trainer.accelerator=gpu \
  --trainer.devices=-1 \
  --trainer.max_epochs=20

Notebooks

Start the notebook server with:

PYTHONPATH=.. jupyter notebook

Model API

The model API is based on generic encoder and decoder classes (PerceiverEncoder and PerceiverDecoder) and task-specific input and output adapters. The following snippet shows how they can be used to create an MNIST image classifier, for example:

from perceiver.model import (
    PerceiverIO,
    PerceiverEncoder,
    PerceiverDecoder,
    ImageInputAdapter,
    ClassificationOutputAdapter,
)

# Fourier-encode pixel positions and flatten along spatial dimensions
input_adapter = ImageInputAdapter(image_shape=(28, 28, 1), num_frequency_bands=32)

# Project generic Perceiver decoder output to specified number of classes
output_adapter = ClassificationOutputAdapter(num_classes=10, num_output_channels=128)

# Generic Perceiver encoder
encoder = PerceiverEncoder(
    input_adapter=input_adapter,
    num_latents=32,
    num_latent_channels=128,
    num_layers=3,
    num_cross_attention_heads=4,
    num_self_attention_heads=4,
    num_self_attention_layers_per_block=3,
    dropout=0.0,
)

# Generic Perceiver decoder
decoder = PerceiverDecoder(
    output_adapter=output_adapter,
    num_latent_channels=128,
    num_cross_attention_heads=1,
    dropout=0.0,
)

# MNIST classifier implemented as Perceiver IO model
mnist_classifier = PerceiverIO(encoder, decoder)

Development environment

Update the project dependencies in the conda environment:

invoke install

Install the pre-commit hooks:

invoke precommit-install

Run code quality checks:

invoke cc

Run tests:

invoke test

The project and task structure presented here is based on the Python Project Template.

Citations

@misc{jaegle2021perceiver,
    title   = {Perceiver: General Perception with Iterative Attention},
    author  = {Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira},
    year    = {2021},
    eprint  = {2103.03206},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

@misc{jaegle2021perceiver,
    title   = {Perceiver IO: A General Architecture for Structured Inputs & Outputs},
    author  = {Andrew Jaegle and Sebastian Borgeaud and Jean-Baptiste Alayrac and Carl Doersch and Catalin Ionescu and David Ding and Skanda Koppula and Andrew Brock and Evan Shelhamer and Olivier Hénaff and Matthew M. Botvinick and Andrew Zisserman and Oriol Vinyals and João Carreira},
    year    = {2021},
    eprint  = {2107.14795},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.11.0

Jun 12, 2023

0.10.0

May 8, 2023

0.9.0

Apr 23, 2023

0.8.2

Mar 31, 2023

0.8.1

Feb 24, 2023

0.8.0

Feb 21, 2023

0.7.0

Dec 4, 2022

0.7b1 pre-release

Nov 20, 2022

0.6.0

Sep 25, 2022

0.5.1

Aug 31, 2022

0.5.0

Aug 21, 2022

0.4.0

Jul 21, 2022

0.3.0

May 12, 2022

0.2.1

May 9, 2022

0.2.0 yanked

May 9, 2022

0.1.2

Mar 29, 2022

This version

0.1.1

Mar 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perceiver-io-0.1.1.tar.gz (45.2 kB view hashes)

Uploaded Mar 28, 2022 Source

Built Distribution

perceiver_io-0.1.1-py3-none-any.whl (23.4 kB view hashes)

Uploaded Mar 28, 2022 Python 3

Hashes for perceiver-io-0.1.1.tar.gz

Hashes for perceiver-io-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5a8dc997a2fb23a6897a0b5b63011a0318dabce408113153a22097231f2617c7`
MD5	`abf6ac6410c221e4b137815610acdcf1`
BLAKE2b-256	`f93ca7f2ffd8457641fcbc42f9f6659264efbe2e93661d8151196e19d7d8fe62`

Hashes for perceiver_io-0.1.1-py3-none-any.whl

Hashes for perceiver_io-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d554f708c9929d695629b4f3515189e1b2f3a3b6788cb2e1bdeaf4fdcba960a6`
MD5	`e7509622b0e77fd1f0b42841ec440039`
BLAKE2b-256	`4624708e3211f534f86fde787c399b671e3eff203e7e3ddccbf271c6f0df78dd`