Compact, concise and customizable deep learning computer vision

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

%load_ext autoreload
%autoreload 2

Glasses 😎

alt

Compact, concise and customizable deep learning computer vision library

So far I have the following pretrainde weights. I am working on porting more. They are hosted on GitHub if < 100MB and on AWS (thaks to Francis Ukpeh) if > 100MB.

Doc is here

TL;TR

This library has

human readable code, no research code
common component are shared across models
same APIs for all models (you learn them once and they are always the same)
clear and easy to use model constomization (see here)
classification and segmentation
emoji in the name ;)

Installation

You can install glasses using pip by running

pip install git+https://github.com/FrancescoSaverioZuppichini/glasses

Tutorials

Check out ./tutorials . So far:

Motivations

Almost all existing implementation of the most famous model are written with very bad coding practices, what today is called research code. I struggled myself to understand some of the implementation that in the end were just few lines of code.

Most of them are missing a global structure, they used tons of code repetition, they are not easily customizable and not tested. Since I do computer vision for living, so I needed a way to make my life easier.

Getting started

The API are shared across all models!

import torch
from glasses.models import AutoModel, AutoConfig
from torch import nn
# load one model
model = AutoModel.from_pretrained('resnet18')
cfg = AutoConfig.from_name('resnet18')
model.summary(device='cpu' ) # thanks to torchsummary
AutoModel.models() # 'resnet18', 'resnet26', 'resnet26d', 'resnet34', 'resnet50', ...

Interpretability

import requests
from PIL import Image
from io import BytesIO
from glasses.interpretability import GradCam, SaliencyMap
from torchvision.transforms import Normalize
r = requests.get('https://i.insider.com/5df126b679d7570ad2044f3e?width=700&format=jpeg&auto=webp')
im = Image.open(BytesIO(r.content))
# un normalize when done
postprocessing = Normalize(-cfg.mean / cfg.std, (1.0 / cfg.std))
# apply preprocessing
x =  cfg.transform(im).unsqueeze(0)
_ = model.interpret(x, using=GradCam(), postprocessing=postprocessing).show()

alt

Classification

from glasses.models import ResNet
# change activation
ResNet.resnet18(activation = nn.SELU)
# change number of classes
ResNet.resnet18(n_classes=100)
# freeze only the convolution weights
model = ResNet.resnet18(pretrained=True)
model.freeze(who=model.encoder)
# get the last layer, usuful to hook to it if you want to get the embeeded vector
model.encoder.layers[-1]
# what about resnet with inverted residuals?
from glasses.models.classification.efficientnet import InvertedResidualBlock
ResNet.resnet18(block = InvertedResidualBlock)

Segmentation

from functools import partial
from glasses.models.segmentation.unet import UNet, UNetDecoder
# vanilla Unet
unet = UNet()
# let's change the encoder
unet = UNet.from_encoder(partial(AutoModel.from_name, 'efficientnet_b1'))
# mmm I want more layers in the decoder!
unet = UNet(decoder=partial(UNetDecoder, widths=[256, 128, 64, 32, 16]))
# maybe resnet was better
unet = UNet(encoder=lambda **kwargs: ResNet.resnet26(**kwargs).encoder)
# same API
unet.summary(input_shape=(1,224,224))

More examples

# change the decoder part
model = ResNet.resnet18(pretrained=True)
my_head = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(model.encoder.widths[-1], 512),
    nn.Dropout(0.2),
    nn.ReLU(),
    nn.Linear(512, 1000))

model.head = my_head

x = torch.rand((1,3,224,224))
model(x).shape #torch.Size([1, 1000])

Pretrained Models

I am currently working on the pretrained models and the best way to make them available

This is a list of all the pretrained models available so far!. They are all trained on ImageNet.

I used a batch_size=64 and a GTX 1080ti to evaluale the models.

	top1	top5	time
efficientnet_b3	0.82034	0.9603	199.599
regnety_032	0.81958	0.95964	136.518
resnet50d	0.80492	0.95128	97.5827
cse_resnet50	0.80292	0.95048	108.765
efficientnet_b2	0.80126	0.95124	127.177
resnext101_32x8d	0.7921	0.94556	290.38
wide_resnet101_2	0.7891	0.94344	277.755
wide_resnet50_2	0.78464	0.94064	201.634
efficientnet_b1	0.7831	0.94096	98.7143
resnet152	0.7825	0.93982	186.191
regnetx_032	0.7792	0.93996	319.558
resnext50_32x4d	0.77628	0.9368	114.325
regnety_016	0.77604	0.93702	96.547
efficientnet_b0	0.77332	0.93566	67.2147
resnet101	0.77314	0.93556	134.148
densenet161	0.77146	0.93602	239.388
resnet34d	0.77118	0.93418	59.9938
densenet201	0.76932	0.9339	158.514
regnetx_016	0.76684	0.9328	91.7536
resnet26d	0.766	0.93188	70.6453
regnety_008	0.76238	0.93026	54.1286
resnet50	0.76012	0.92934	89.7976
densenet169	0.75628	0.9281	127.077
resnet26	0.75394	0.92584	65.5801
resnet34	0.75096	0.92246	56.8985
regnety_006	0.75068	0.92474	55.5611
regnetx_008	0.74788	0.92194	57.9559
densenet121	0.74472	0.91974	104.13
vgg19_bn	0.74216	0.91848	169.357
regnety_004	0.73766	0.91638	68.4893
regnetx_006	0.73682	0.91568	81.4703
vgg16_bn	0.73476	0.91536	150.317
vgg19	0.7236	0.9085	155.851
regnetx_004	0.72298	0.90644	58.0049
vgg16	0.71628	0.90368	135.398
vgg13_bn	0.71618	0.9036	129.077
vgg11_bn	0.70408	0.89724	86.9459
vgg13	0.69984	0.89306	116.052
regnety_002	0.6998	0.89422	46.804
resnet18	0.69644	0.88982	46.2029
vgg11	0.68872	0.88658	79.4136
regnetx_002	0.68658	0.88244	45.9211

Assuming you want to load efficientnet_b1:

from glasses.models import EfficientNet, AutoModel, AutoConfig

# load it using AutoModel
model = AutoModel.from_pretrained('efficientnet_b1')
# or from its own class
model = EfficientNet.efficientnet_b1(pretrained=True)
# you may also need to get the correct transformation that must be applied on the input
cfg = AutoConfig.from_name('efficientnet_b1')
transform = cfg.transform

In this case, transform is

Compose(
    Resize(size=240, interpolation=PIL.Image.BICUBIC)
    CenterCrop(size=(240, 240))
    ToTensor()
    Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)

Deep Customization

All models are composed by sharable parts:

Block
Layer
Encoder
Head
Decoder

Block

Each model has its building block, they are noted by *Block. In each block, all the weights are in the .block field. This makes it very easy to customize one specific model.

from glasses.models.classification.vgg import VGGBasicBlock
from glasses.models.classification.resnet import ResNetBasicBlock, ResNetBottleneckBlock, ResNetBasicPreActBlock, ResNetBottleneckPreActBlock
from glasses.models.classification.senet import SENetBasicBlock, SENetBottleneckBlock
from glasses.models.classification.resnetxt import ResNetXtBottleNeckBlock
from glasses.models.classification.densenet import DenseBottleNeckBlock
from glasses.models.classification.wide_resnet import WideResNetBottleNeckBlock
from glasses.models.classification.efficientnet import EfficientNetBasicBlock

For example, if we want to add Squeeze and Excitation to the resnet bottleneck block, we can just

from glasses.nn.att import SpatialSE
from  glasses.models.classification.resnet import ResNetBottleneckBlock

class SEResNetBottleneckBlock(ResNetBottleneckBlock):
    def __init__(self, in_features: int, out_features: int, squeeze: int = 16, *args, **kwargs):
        super().__init__(in_features, out_features, *args, **kwargs)
        # all the weights are in block, we want to apply se after the weights
        self.block.add_module('se', SpatialSE(out_features, reduction=squeeze))

SEResNetBottleneckBlock(32, 64)

Then, we can use the class methods to create the new models following the existing architecture blueprint, for example, to create se_resnet50

ResNet.resnet50(block=ResNetBottleneckBlock)

The cool thing is each model has the same api, if I want to create a vgg13 with the ResNetBottleneckBlock I can just

from glasses.models import VGG
model = VGG.vgg13(block=SEResNetBottleneckBlock)
model.summary()

Some specific model can require additional parameter to the block, for example MobileNetV2 also required a expansion parameter so our SEResNetBottleneckBlock won't work.

Layer

A Layer is a collection of blocks, it is used to stack multiple blocks together following some logic. For example, ResNetLayer

from glasses.models.classification.resnet import ResNetLayer

ResNetLayer(64, 128, depth=2)

Encoder

The encoder is what encoders a vector, so the convolution layers. It has always two very important parameters.

widths
depths

widths is the wide at each layer, so how much features there are depths is the depth at each layer, so how many blocks there are

For example, ResNetEncoder will creates multiple ResNetLayer based on the len of widths and depths. Let's see some example.

from glasses.models.classification.resnet import ResNetEncoder
# 3 layers, with 32,64,128 features and 1,2,3 block each
ResNetEncoder(
    widths=[32,64,128],
    depths=[1,2,3])

All encoders are subclass of Encoder that allows us to hook on specific stages to get the featuers. All you have to do is first call .features to notify the model you want to receive the features, and then pass an input.

enc = ResNetEncoder()
enc.features
enc(torch.randn((1,3,224,224)))
print([f.shape for f in enc.features])

Remember each model has always a .decoder field

from glasses.models import ResNet

model = ResNet.resnet18()
model.encoder.widths[-1]

The encoder knows the number of output features, you can access them by

Features

Each encoder can return a list of features accessable by the .features field. You need to call it once before in order to notify the encoder we wish to also store the features

from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
enc(x)
features = enc.features # now we have all the features from each layer (stage)
[print(f.shape) for f in features]
# torch.Size([1, 64, 112, 112])
# torch.Size([1, 64, 56, 56])
# torch.Size([1, 128, 28, 28])
# torch.Size([1, 256, 14, 14])

Head

Head is the last part of the model, it usually perform the classification

from glasses.models.classification.resnet import ResNetHead


ResNetHead(512, n_classes=1000)

Decoder

The decoder takes the last feature from the .encoder and decode it. This is usually done in segmentation models, such as Unet.

from glasses.models.segmentation.unet import UNetDecoder
x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
x = enc(x)
features = enc.features
# we need to tell the decoder the first feature size and the size of the lateral features
dec = UNetDecoder(start_features=enc.widths[-1],
                  lateral_widths=enc.features_widths[::-1])
out = dec(x, features[::-1])
out.shape

This object oriented structure allows to reuse most of the code across the models

Models

The models so far

name	Parameters	Size (MB)
resnet18	11,689,512	44.59
resnet26	15,995,176	61.02
resnet26d	16,014,408	61.09
resnet34	21,797,672	83.15
resnet50	25,557,032	97.49
resnet50d	25,576,264	97.57
resnet101	44,549,160	169.94
resnet152	60,192,808	229.62
resnet200	64,673,832	246.71
se_resnet18	11,776,552	44.92
se_resnet34	21,954,856	83.75
se_resnet50	28,071,976	107.09
se_resnet101	49,292,328	188.04
se_resnet152	66,770,984	254.71
cse_resnet18	11,778,592	44.93
cse_resnet34	21,958,868	83.77
cse_resnet50	28,088,024	107.15
cse_resnet101	49,326,872	188.17
cse_resnet152	66,821,848	254.91
resnext50_32x4d	25,028,904	95.48
resnext101_32x8d	88,791,336	338.71
resnext101_32x16d	194,026,792	740.15
resnext101_32x32d	468,530,472	1787.3
resnext101_32x48d	828,411,176	3160.14
regnetx_002	2,684,792	10.24
regnetx_004	5,157,512	19.67
regnetx_006	6,196,040	23.64
regnetx_008	7,259,656	27.69
regnetx_016	9,190,136	35.06
regnetx_032	15,296,552	58.35
regnety_002	3,162,996	12.07
regnety_004	4,344,144	16.57
regnety_006	6,055,160	23.1
regnety_008	6,263,168	23.89
regnety_016	11,202,430	42.73
regnety_032	19,436,338	74.14
wide_resnet50_2	68,883,240	262.77
wide_resnet101_2	126,886,696	484.03
densenet121	7,978,856	30.44
densenet169	14,149,480	53.98
densenet201	20,013,928	76.35
densenet161	28,681,000	109.41
fishnet99	16,630,312	63.44
fishnet150	24,960,808	95.22
vgg11	132,863,336	506.83
vgg13	133,047,848	507.54
vgg16	138,357,544	527.79
vgg19	143,667,240	548.05
vgg11_bn	132,868,840	506.85
vgg13_bn	133,053,736	507.56
vgg16_bn	138,365,992	527.82
vgg19_bn	143,678,248	548.09
efficientnet_b0	5,288,548	20.17
efficientnet_b1	7,794,184	29.73
efficientnet_b2	9,109,994	34.75
efficientnet_b3	12,233,232	46.67
efficientnet_b4	19,341,616	73.78
efficientnet_b5	30,389,784	115.93
efficientnet_b6	43,040,704	164.19
efficientnet_b7	66,347,960	253.1
efficientnet_b8	87,413,142	333.45
efficientnet_l2	480,309,308	1832.23
efficientnet_lite0	4,652,008	17.75
efficientnet_lite1	5,416,680	20.66
efficientnet_lite2	6,092,072	23.24
efficientnet_lite3	8,197,096	31.27
efficientnet_lite4	13,006,568	49.62
mobilenetv2	3,504,872	13.37
unet	23,202,530	88.51

Credits

Most of the weights were trained by other people and adapted to glasses. It is worth cite

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.6

Jan 25, 2021

This version

0.0.5

Dec 26, 2020

0.0.4

Oct 30, 2020

0.0.3

Oct 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glasses-0.0.5.tar.gz (58.0 kB view hashes)

Uploaded Dec 26, 2020 Source

Built Distribution

glasses-0.0.5-py3-none-any.whl (64.0 kB view hashes)

Uploaded Dec 26, 2020 Python 3

Hashes for glasses-0.0.5.tar.gz

Hashes for glasses-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`2111af96392b593a91f0cb07252ea9b62e1a5702b5be8e8c9453bdd6049cf5d8`
MD5	`6c50b9fe02efbb5ba56141ba03aad18e`
BLAKE2b-256	`5b0431e6f9ba756456b9aaf3c0e5e91eb5775328f663cd0966c5a08dcd910951`

Hashes for glasses-0.0.5-py3-none-any.whl

Hashes for glasses-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`198f2c9bd19c946c9bbfc64eb2badcbf9f012c6a2efafc8646c9c12350576cab`
MD5	`0aa4e9df6cd0d6e25241efbe2cfc7a4c`
BLAKE2b-256	`2da00179ba059640a8f6e43154c6fcf31c439db1a0c7c72093f927f61d850f17`