pipinstall-Ukeras-cv-attention-models
# Or
pipinstall-Ugit+https://github.com/leondgarse/keras_cv_attention_models
Refer to each sub directory for detail usage.
Basic model prediction
fromkeras_cv_attention_modelsimportvolomm=volo.VOLO_d1(pretrained="imagenet")""" Run predict """importtensorflowastffromtensorflowimportkerasfromskimage.dataimportchelseaimg=chelsea()# Chelsea the catimm=keras.applications.imagenet_utils.preprocess_input(img,mode='torch')pred=mm(tf.expand_dims(tf.image.resize(imm,mm.input_shape[1:3]),0)).numpy()pred=tf.nn.softmax(pred).numpy()# If classifier activation is not softmaxprint(keras.applications.imagenet_utils.decode_predictions(pred)[0])# [('n02124075', 'Egyptian_cat', 0.9692954),# ('n02123045', 'tabby', 0.020203391),# ('n02123159', 'tiger_cat', 0.006867502),# ('n02127052', 'lynx', 0.00017674894),# ('n02123597', 'Siamese_cat', 4.9493494e-05)]
attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
# Not sure about how useful is resize_antialias, default behavior for timm using `bicubic`CUDA_VISIBLE_DEVICES='0'TF_XLA_FLAGS="--tf_xla_auto_jit=2"./train_script.py--seed0--resize_antialias-saotnet50
# Evaluation using input_shape (224, 224).# `antialias` usage should be same with training.CUDA_VISIBLE_DEVICES='1'./eval_script.py-maotnet50_epoch_103_val_acc_0.7674.h5-i224--central_crop0.95--antialias
# >>>> Accuracy top1: 0.78466 top5: 0.94088
Currently TFLite not supporting Conv2D with groups>1 / gelu / tf.image.extract_patches / tf.transpose with len(perm) > 4. Some operations could be supported in tf-nightly version. May try if encountering issue. More discussion can be found Converting a trained keras CV attention model to TFLite #17.
tf.nn.gelu(inputs, approximate=True) activation works for TFLite. Define model with activation="gelu/approximate" or activation="gelu/app" will set approximate=True for gelu. Should better decide before training, or there may be accuracy loss.
model_surgery.convert_groups_conv2d_2_split_conv2d converts model Conv2D with groups>1 layers to SplitConv using split -> conv -> concat:
fromkeras_cv_attention_modelsimportregnet,model_surgeryfromkeras_cv_attention_models.imagenetimporteval_funcbb=regnet.RegNetZD32()mm=model_surgery.convert_groups_conv2d_2_split_conv2d(bb)# converts all `Conv2D` using `groups` to `SplitConv2D`test_inputs=np.random.uniform(size=[1,*mm.input_shape[1:]])print(np.allclose(mm(test_inputs),bb(test_inputs)))# Trueconverter=tf.lite.TFLiteConverter.from_keras_model(mm)open(mm.name+".tflite","wb").write(converter.convert())print(np.allclose(mm(test_inputs),eval_func.TFLiteModelInterf(mm.name+'.tflite')(test_inputs),atol=1e-7))# True
model_surgery.convert_gelu_and_extract_patches_for_tflite converts model gelu activation to gelu approximate=True, and tf.image.extract_patches to a Conv2D version:
Not supporting VOLO / HaloNet models converting, cause they need a longer tf.transposeperm.
Models
AotNet
Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer. Works like byoanet / byobnet from timm.
Default parameters set is a typical ResNet architecture with Conv2D use_bias=False and padding like PyTorch.
fromkeras_cv_attention_modelsimportaotnet# Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters.# 50 is just a picked number that larger than the relative `num_block`.attn_types=[None,"outlook",["bot","halo"]*50,"cot"],se_ratio=[0.25,0,0,0],model=aotnet.AotNet50V2(attn_types=attn_types,se_ratio=se_ratio,stem_type="deep",strides=1)model.summary()