Project description
Keras_cv_attention_models
- coco_train_script.py is under testing
General Usage
Basic
- Currently recommended TF version is
tensorflow==2.8.0
. Expecially for training or TFLite conversion.
- Default import
import os
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
- Install as pip package:
pip install -U keras-cv-attention-models
# Or
pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
Refer to each sub directory for detail usage.
- Basic model prediction
from keras_cv_attention_models import volo
mm = volo.VOLO_d1(pretrained="imagenet")
""" Run predict """
import tensorflow as tf
from tensorflow import keras
from skimage.data import chelsea
img = chelsea() # Chelsea the cat
imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
pred = tf.nn.softmax(pred).numpy() # If classifier activation is not softmax
print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
# [('n02124075', 'Egyptian_cat', 0.9692954),
# ('n02123045', 'tabby', 0.020203391),
# ('n02123159', 'tiger_cat', 0.006867502),
# ('n02127052', 'lynx', 0.00017674894),
# ('n02123597', 'Siamese_cat', 4.9493494e-05)]
Or just use model preset preprocess_input
and decode_predictions
from keras_cv_attention_models import coatnet
from skimage.data import chelsea
mm = coatnet.CoAtNet0()
preds = mm(mm.preprocess_input(chelsea()))
print(mm.decode_predictions(preds))
# [[('n02124075', 'Egyptian_cat', 0.9653769), ('n02123159', 'tiger_cat', 0.018427467), ...]
- Exclude model top layers by set
num_classes=0
from keras_cv_attention_models import resnest
mm = resnest.ResNest50(num_classes=0)
print(mm.output_shape)
# (None, 7, 7, 2048)
- Reload own model weights by set
pretrained="xxx.h5"
. Better if reloading model with different input_shape
and with weights shape not matching.
import os
from keras_cv_attention_models import coatnet
pretrained = os.path.expanduser('~/.keras/models/coatnet0_224_imagenet.h5')
mm = coatnet.CoAtNet1(input_shape=(384, 384, 3), pretrained=pretrained)
- Alias name
kecam
can be used instead of keras_cv_attention_models
. It's __init__.py
only with one line from keras_cv_attention_models import *
.
import kecam
mm = kecam.yolor.YOLOR_CSP()
imm = kecam.test_images.dog_cat()
preds = mm(mm.preprocess_input(imm))
bboxs, lables, confidences = mm.decode_predictions(preds)[0]
kecam.coco.show_image_with_bboxes(imm, bboxs, lables, confidences)
- Calculate flops method from TF 2.0 Feature: Flops calculation #32809.
from keras_cv_attention_models import coatnet, resnest, model_surgery
model_surgery.get_flops(coatnet.CoAtNet0())
# >>>> Flops: 4,221,908,559, GFlops: 4.2219G
model_surgery.get_flops(resnest.ResNest50())
# >>>> Flops: 5,378,399,992, GFlops: 5.3784G
Layers
- attention_layers is
__init__.py
only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding
from botnet
, outlook_attention
from volo
.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])
Model surgery
- model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
mm = keras.applications.ResNet50() # Trainable params: 25,583,592
# Replace all ReLU with PReLU. Trainable params: 25,606,312
mm = model_surgery.replace_ReLU(mm, target_activation='PReLU')
# Fuse conv and batch_norm layers. Trainable params: 25,553,192
mm = model_surgery.convert_to_fused_conv_bn_model(mm)
ImageNet training and evaluating
- ImageNet contains more detail usage and some comparing results.
- Init Imagenet dataset using tensorflow_datasets #9.
- For custom dataset, recommending method is using
tfds.load
, refer Writing custom datasets and Creating private tensorflow_datasets from tfds #48 by @Medicmind.
custom_dataset_script.py
can also be used creating a json
format file, which can be used as --data_name xxx.json
for training, detail usage can be found in Custom recognition dataset.
aotnet.AotNet50
default parameters set is a typical ResNet50
architecture with Conv2D use_bias=False
and padding
like PyTorch
.
- Default parameters for
train_script.py
is like A3
configuration from ResNet strikes back: An improved training procedure in timm with batch_size=256, input_shape=(160, 160)
.
# `antialias` is default enabled for resize, can be turned off be set `--disable_antialias`.
CUDA_VISIBLE_DEVICES='0' TF_XLA_FLAGS="--tf_xla_auto_jit=2" ./train_script.py --seed 0 -s aotnet50
# Evaluation using input_shape (224, 224).
# `antialias` usage should be same with training.
CUDA_VISIBLE_DEVICES='1' ./eval_script.py -m aotnet50_epoch_103_val_acc_0.7674.h5 -i 224 --central_crop 0.95
# >>>> Accuracy top1: 0.78466 top5: 0.94088
- Progressive training refer to PDF 2104.00298 EfficientNetV2: Smaller Models and Faster Training. AotNet50 A3 progressive input shapes
96 128 160
:
CUDA_VISIBLE_DEVICES='1' TF_XLA_FLAGS="--tf_xla_auto_jit=2" ./progressive_train_script.py \
--progressive_epochs 33 66 -1 \
--progressive_input_shapes 96 128 160 \
--progressive_magnitudes 2 4 6 \
-s aotnet50_progressive_3_lr_steps_100 --seed 0
eval_script.py
is used for evaluating model accuracy.
# evaluating pretrained builtin model
CUDA_VISIBLE_DEVICES='1' ./eval_script.py -m regnet.RegNetZD8
# evaluating pretrained timm model
CUDA_VISIBLE_DEVICES='1' ./eval_script.py -m timm.models.resmlp_12_224 --input_shape 224
# evaluating specific h5 model
CUDA_VISIBLE_DEVICES='1' ./eval_script.py -m checkpoints/xxx.h5
# evaluating specific tflite model
CUDA_VISIBLE_DEVICES='1' ./eval_script.py -m xxx.tflite
COCO training and evaluating
-
Currently still under testing.
-
COCO contains more detail usage.
-
custom_dataset_script.py
can be used creating a json
format file, which can be used as --data_name xxx.json
for training, detail usage can be found in Custom detection dataset.
-
Default parameters for coco_train_script.py
is EfficientDetD0
with input_shape=(256, 256, 3), batch_size=64, mosaic_mix_prob=0.5, freeze_backbone_epochs=32, total_epochs=105
. Technically, it's any pyramid structure backbone
+ EfficientDet / YOLOX header / YOLOR header
+ anchor_free / yolor_anchors / efficientdet_anchors
combination supported.
-
Currently 3 types anchors supported,
- use_anchor_free_mode controls if using typical
YOLOX anchor_free mode
strategy.
- use_yolor_anchors_mode controls if using yolor anchors.
- Default is
use_anchor_free_mode=False, use_yolor_anchors_mode=False
, means using efficientdet
preset anchors.
anchors_mode |
use_object_scores |
num_anchors |
anchor_scale |
aspect_ratios |
num_scales |
grid_zero_start |
efficientdet |
False |
9 |
4 |
[1, 2, 0.5] |
3 |
False |
anchor_free |
True |
1 |
1 |
[1] |
1 |
True |
yolor_anchors |
True |
3 |
None |
presets |
None |
offset=0.5 |
# Default EfficientDetD0
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py
# Default EfficientDetD0 using input_shape 512, optimizer adamw, freezing backbone 16 epochs, total 50 + 5 epochs
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py -i 512 -p adamw --freeze_backbone_epochs 16 --lr_decay_steps 50
# EfficientNetV2B0 backbone + EfficientDetD0 detection header
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone efficientnet.EfficientNetV2B0 --det_header efficientdet.EfficientDetD0
# ResNest50 backbone + EfficientDetD0 header using yolox like anchor_free_mode
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone resnest.ResNest50 --use_anchor_free_mode
# ConvNeXtTiny backbone + EfficientDetD0 header using yolor anchors
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone uniformer.UniformerSmall32 --use_yolor_anchors_mode
# Typical YOLOXS with anchor_free_mode
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --det_header yolox.YOLOXS --use_anchor_free_mode
# YOLOXS with efficientdet anchors
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --det_header yolox.YOLOXS
# ConvNeXtTiny backbone + YOLOX header with yolor anchors
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone coatnet.CoAtNet0 --det_header yolox.YOLOX --use_yolor_anchors_mode
# Typical YOLOR_P6 with yolor anchors
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --det_header yolor.YOLOR_P6 --use_yolor_anchors_mode
# YOLOR_P6 with anchor_free_mode
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --det_header yolor.YOLOR_P6 --use_anchor_free_mode
# ConvNeXtTiny backbone + YOLOR header with efficientdet anchors
CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone convnext.ConvNeXtTiny --det_header yolor.YOLOR
Note: COCO training still under testing, may change parameters and default behaviors. Take the risk if would like help developing.
-
coco_eval_script.py
is used for evaluating model AP / AR on COCO validation set. It has a dependency pip install pycocotools
which is not in package requirements. More usage can be found in COCO Evaluation.
# resize method for EfficientDetD0 is bilinear w/o antialias
CUDA_VISIBLE_DEVICES='1' ./coco_eval_script.py -m efficientdet.EfficientDetD0 --resize_method bilinear --disable_antialias
# Specify --use_anchor_free_mode for YOLOX, and BGR input format
CUDA_VISIBLE_DEVICES='1' ./coco_eval_script.py -m yolox.YOLOXTiny --use_anchor_free_mode --use_bgr_input --nms_method hard --nms_iou_or_sigma 0.65
# Specify --use_yolor_anchors_mode for YOLOR. Note: result still lower than official sets
CUDA_VISIBLE_DEVICES='1' ./coco_eval_script.py -m yolox.YOLOR_CSP --use_yolor_anchors_mode --nms_method hard --nms_iou_or_sigma 0.65
# Specific h5 model
CUDA_VISIBLE_DEVICES='1' ./coco_eval_script.py -m checkpoints/yoloxtiny_yolor_anchor.h5 --use_yolor_anchors_mode
Visualizing
- Visualizing is for visualizing convnet filters or attention map scores.
- make_and_apply_gradcam_heatmap is for Grad-CAM class activation visualization.
from keras_cv_attention_models import visualizing, test_images, resnest
mm = resnest.ResNest50()
img = test_images.dog()
superimposed_img, heatmap, preds = visualizing.make_and_apply_gradcam_heatmap(mm, img, layer_name="auto")
- plot_attention_score_maps is model attention score maps visualization.
from keras_cv_attention_models import visualizing, test_images, botnet
img = test_images.dog()
_ = visualizing.plot_attention_score_maps(botnet.BotNetSE33T(), img)
TFLite Conversion
- Currently
TFLite
not supporting Conv2D with groups>1
/ gelu
/ tf.image.extract_patches
/ tf.transpose with len(perm) > 4
. Some operations could be supported in tf-nightly
version. May try if encountering issue. More discussion can be found Converting a trained keras CV attention model to TFLite #17. Some speed testing results can be found How to speed up inference on a quantized model #44.
tf.nn.gelu(inputs, approximate=True)
activation works for TFLite. Define model with activation="gelu/approximate"
or activation="gelu/app"
will set approximate=True
for gelu
. Should better decide before training, or there may be accuracy loss.
- model_surgery.convert_groups_conv2d_2_split_conv2d converts model
Conv2D with groups>1
layers to SplitConv
using split -> conv -> concat
:
from keras_cv_attention_models import regnet, model_surgery
from keras_cv_attention_models.imagenet import eval_func
bb = regnet.RegNetZD32()
mm = model_surgery.convert_groups_conv2d_2_split_conv2d(bb) # converts all `Conv2D` using `groups` to `SplitConv2D`
test_inputs = np.random.uniform(size=[1, *mm.input_shape[1:]])
print(np.allclose(mm(test_inputs), bb(test_inputs)))
# True
converter = tf.lite.TFLiteConverter.from_keras_model(mm)
open(mm.name + ".tflite", "wb").write(converter.convert())
print(np.allclose(mm(test_inputs), eval_func.TFLiteModelInterf(mm.name + '.tflite')(test_inputs), atol=1e-7))
# True
- model_surgery.convert_gelu_and_extract_patches_for_tflite converts model
gelu
activation to gelu approximate=True
, and tf.image.extract_patches
to a Conv2D
version:
from keras_cv_attention_models import cotnet, model_surgery
from keras_cv_attention_models.imagenet import eval_func
mm = cotnet.CotNetSE50D()
mm = model_surgery.convert_groups_conv2d_2_split_conv2d(mm)
mm = model_surgery.convert_gelu_and_extract_patches_for_tflite(mm)
converter = tf.lite.TFLiteConverter.from_keras_model(mm)
open(mm.name + ".tflite", "wb").write(converter.convert())
test_inputs = np.random.uniform(size=[1, *mm.input_shape[1:]])
print(np.allclose(mm(test_inputs), eval_func.TFLiteModelInterf(mm.name + '.tflite')(test_inputs), atol=1e-7))
# True
- model_surgery.prepare_for_tflite is just a combination of above 2 functions:
from keras_cv_attention_models import beit, model_surgery
mm = beit.BeitBasePatch16()
mm = model_surgery.prepare_for_tflite(mm)
converter = tf.lite.TFLiteConverter.from_keras_model(mm)
open(mm.name + ".tflite", "wb").write(converter.convert())
- Not supporting
VOLO
/ HaloNet
models converting, cause they need a longer tf.transpose
perm
.
Recognition Models
AotNet
- Keras AotNet is just a
ResNet
/ ResNetV2
like framework, that set parameters like attn_types
and se_ratio
and others, which is used to apply different types attention layer. Works like byoanet
/ byobnet
from timm
.
- Default parameters set is a typical
ResNet
architecture with Conv2D use_bias=False
and padding
like PyTorch
.
from keras_cv_attention_models import aotnet
# Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters.
# 50 is just a picked number that larger than the relative `num_block`.
attn_types = [None, "outlook", ["bot", "halo"] * 50, "cot"],
se_ratio = [0.25, 0, 0, 0],
model = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, stem_type="deep", strides=1)
model.summary()
BEIT
BotNet
CMT
Model |
Params |
Image resolution |
Top1 Acc |
Download |
CMTTiny, (Self trained 105 epochs) |
9.5M |
160 |
77.4 |
|
- 305 epochs |
9.5M |
160 |
78.8 |
cmt_tiny_160_imagenet |
- evaluate 224 (not fine-tuned) |
9.5M |
224 |
80.1 |
|
CMTTiny, 1000 epochs |
9.5M |
160 |
79.2 |
|
CMTXS |
15.2M |
192 |
81.8 |
|
CMTSmall |
25.1M |
224 |
83.5 |
|
CMTBig |
45.7M |
256 |
84.5 |
|
CoaT
CoAtNet
Model |
Params |
Image resolution |
Top1 Acc |
Download |
CoAtNet0 (Self trained 105 epochs) |
23.8M |
160 |
80.50 |
coatnet0_160_imagenet.h5 |
- fine-tune 224, 37 epochs |
23.8M |
224 |
82.23 |
coatnet0_224_imagenet.h5 |
CoAtNet0 |
25M |
224 |
81.6 |
|
CoAtNet0, Strided DConv |
25M |
224 |
82.0 |
|
CoAtNet1 |
42M |
224 |
83.3 |
|
CoAtNet1, Strided DConv |
42M |
224 |
83.5 |
|
CoAtNet2 |
75M |
224 |
84.1 |
|
CoAtNet2, Strided DConv |
75M |
224 |
84.1 |
|
CoAtNet2, ImageNet-21k pretrain |
75M |
224 |
87.1 |
|
CoAtNet3 |
168M |
224 |
84.5 |
|
CoAtNet3, ImageNet-21k pretrain |
168M |
224 |
87.6 |
|
CoAtNet3, ImageNet-21k pretrain |
168M |
512 |
87.9 |
|
CoAtNet4, ImageNet-21k pretrain |
275M |
512 |
88.1 |
|
CoAtNet4, ImageNet-21K + PT-RA-E150 |
275M |
512 |
88.56 |
|
JFT pre-trained models accuracy
Model |
Image resolution |
Reported Params |
self-defined Params |
Top1 Acc |
CoAtNet3 |
384 |
168M |
162.96M |
88.52 |
CoAtNet3 |
512 |
168M |
163.57M |
88.81 |
CoAtNet4 |
512 |
275M |
273.10M |
89.11 |
CoAtNet5 |
512 |
688M |
680.47M |
89.77 |
CoAtNet6 |
512 |
1.47B |
1.340B |
90.45 |
CoAtNet7 |
512 |
2.44B |
2.422B |
90.88 |
ConvNeXt
CoTNet
EfficientNet
GMLP
Model |
Params |
Image resolution |
Top1 Acc |
Download |
GMLPTiny16 |
6M |
224 |
72.3 |
|
GMLPS16 |
20M |
224 |
79.6 |
gmlp_s16_imagenet.h5 |
GMLPB16 |
73M |
224 |
81.6 |
|
HaloNet
LeViT
MLP mixer
MobileViT
NFNets
RegNetY
RegNetZ
ResMLP
ResNeSt
ResNetD
ResNetQ
Model |
Params |
Image resolution |
Top1 Acc |
Download |
ResNet51Q |
35.7M |
224 |
82.36 |
resnet51q.h5 |
ResNeXt
UniFormer
VOLO
WaveMLP
Detection Models
EfficientDet
YOLOR
YOLOX
Other implemented tensorflow or keras models
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution