Keras implementation of ViT (Vision Transformer)

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

vit-keras

This is a Keras implementation of the models described in An Image is Worth 16x16 Words: Transformes For Image Recognition at Scale. It is based on an earlier implementation from tuvovan, modified to match the Flax implementation in the official repository.

The weights here are ported over from the weights provided in the official repository. See utils.load_weights_numpy to see how this is done (it's not pretty, but it does the job).

Usage

Install this package using pip install vit-keras

You can use the model out-of-the-box with ImageNet 2012 classes using something like the following. The weights will be downloaded automatically.

from vit_keras import vit, utils

image_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
    image_size=image_size,
    activation='sigmoid',
    pretrained=True,
    include_top=True,
    pretrained_top=True
)
url = 'https://upload.wikimedia.org/wikipedia/commons/d/d7/Granny_smith_and_cross_section.jpg'
image = utils.read(url, image_size)
X = vit.preprocess_inputs(image).reshape(1, image_size, image_size, 3)
y = model.predict(X)
print(classes[y[0].argmax()]) # Granny smith

You can fine-tune using a model loaded as follows.

image_size = 224
model = vit.vit_l32(
    image_size=image_size,
    activation='sigmoid',
    pretrained=True,
    include_top=True,
    pretrained_top=False,
    classes=200
)
# Train this model on your data as desired.

Visualizing Attention Maps

There's some functionality for plotting attention maps for a given image and model. See example below. I'm not sure I'm doing this correctly (the official repository didn't have example code). Feedback /corrections welcome!

import numpy as np
import matplotlib.pyplot as plt
from vit_keras import vit, utils, visualize

# Load a model
image_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
    image_size=image_size,
    activation='sigmoid',
    pretrained=True,
    include_top=True,
    pretrained_top=True
)
classes = utils.get_imagenet_classes()

# Get an image and compute the attention map
url = 'https://upload.wikimedia.org/wikipedia/commons/b/bc/Free%21_%283987584939%29.jpg'
image = utils.read(url, image_size)
attention_map = visualize.attention_map(model=model, image=image)
print('Prediction:', classes[
    model.predict(vit.preprocess_inputs(image)[np.newaxis])[0].argmax()]
)  # Prediction: Eskimo dog, husky

# Plot results
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.axis('off')
ax2.axis('off')
ax1.set_title('Original')
ax2.set_title('Attention Map')
_ = ax1.imshow(image)
_ = ax2.imshow(attention_map)

example of attention map

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

This version

0.1.2

May 3, 2023

0.1.1

Apr 24, 2023

0.1.0

Jul 22, 2021

0.0.16

Jun 26, 2021

0.0.15

May 18, 2021

0.0.14

Feb 28, 2021

0.0.13

Feb 24, 2021

0.0.12

Feb 14, 2021

0.0.11

Feb 8, 2021

0.0.10

Dec 26, 2020

0.0.9

Dec 25, 2020

0.0.8

Dec 24, 2020

0.0.7

Dec 24, 2020

0.0.6

Dec 24, 2020

0.0.5

Dec 22, 2020

0.0.4

Nov 17, 2020

0.0.3

Nov 9, 2020

0.0.2

Nov 8, 2020

0.0.1

Nov 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

vit_keras-0.1.2-py3-none-any.whl (24.5 kB view hashes)

Uploaded May 3, 2023 Python 3

Hashes for vit_keras-0.1.2-py3-none-any.whl

Hashes for vit_keras-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf24fae8b5fac59646d6d899ca0c57b5a7daa9027f6d7f58d7cdd4443e86e9e8`
MD5	`457ed83a63a85976a96fdce95cf32824`
BLAKE2b-256	`73214af69130226fae3e937e598c1b6cd56ce008e86641034984f4cacd93c394`