Skip to main content

Pretrained keras 3 vision models

Project description

KVMM: Keras Vision Models 🚀

License Keras Python

📌 Table of Contents

📖 Introduction

Keras Vision Models (KVMM) is a collection of vision models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including segmentation, object detection, vision-language modeling (VLMs), and classification. KVMM includes custom layers and backbone support, providing flexibility and efficiency across various vision applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.

⚡Installation

From PyPI (recommended)

pip install -U kvmm

From Source

pip install -U git+https://github.com/IMvision12/keras-vision-models

🛠️ Usage

🔎 Listing Available Models

Shows all available models, including backbones, segmentation models, object detection models, and vision-language models (VLMs). It also includes the names of the weights available for each specific model variant.

import kvmm
print(kvmm.list_models())

## Output:
"""
CaiTM36 : fb_dist_in1k_384
CaiTM48 : fb_dist_in1k_448
CaiTS24 : fb_dist_in1k_224, fb_dist_in1k_384
...
ConvMixer1024D20 : in1k
ConvMixer1536D20 : in1k
...
ConvNeXtAtto : d2_in1k
ConvNeXtBase : fb_in1k, fb_in22k, fb_in22k_ft_in1k, fb_in22k_ft_in1k_384
...
"""

🔎 List Specific Model Variant

import kvmm
print(kvmm.list_models("swin"))

# Output:
"""
SwinBaseP4W12 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinBaseP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W12 : ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W7 : ms_in22k, ms_in22k_ft_in1k
SwinSmallP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinTinyP4W7 : ms_in1k, ms_in22k
"""

⚙️ Layers

KVMM provides various custom layers like StochasticDepth, LayerScale, EfficientMultiheadSelfAttention, and more. These layers can be seamlessly integrated into your custom models and workflows 🚀
import kvmm

# Example 1
layer = kvmm.layers.StochasticDepth(drop_path_rate=0.1)
output = layer(input_tensor, training=True)

# Example 2
window_partition = WindowPartition(window_size=7)
windowed_features = window_partition(features, height=28, width=28)

🏗️ Backbone Usage (Classification)

🛠️ Basic Usage

import kvmm
import numpy as np

# default configuration
model = kvmm.models.vit.ViTTiny16()

# For Fine-Tuning (default weight)
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3))
# Custom Weight
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3), weights="augreg_in21k_224")

# Backbone Support
model = kvmm.models.vit.ViTTiny16(include_top=False, as_backbone=True, input_shape=(224,224,3), weights="augreg_in21k_224")
random_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
features = model(random_input)
print(f"Number of feature maps: {len(features)}")
for i, feature in enumerate(features):
    print(f"Feature {i} shape: {feature.shape}")

"""
Output:

Number of feature maps: 13
Feature 0 shape: (1, 197, 192)
Feature 1 shape: (1, 197, 192)
Feature 2 shape: (1, 197, 192)
...
"""    

Example Inference

from keras import ops
from keras.applications.imagenet_utils import decode_predictions
import kvmm
from PIL import Image

model = kvmm.models.swin.SwinTinyP4W7(input_shape=[224, 224, 3])

image = Image.open("bird.png").resize((224, 224))
x = ops.convert_to_tensor(image)
x = ops.expand_dims(x, axis=0)

# Predict
preds = model.predict(x)
print("Predicted:", decode_predictions(preds, top=3)[0])

#output:
Predicted: [('n01537544', 'indigo_bunting', np.float32(0.9135666)), ('n01806143', 'peacock', np.float32(0.0003379386)), ('n02017213', 'European_gallinule', np.float32(0.00027174334))]

🧩 Segmentation

🛠️ Basic Usage

import kvmm

# Pre-Trained weights (cityscapes or ade20kor mit(in1k))
# ade20k and cityscapes can be used for fine-tuning by giving custom `num_classes`
# If `num_classes` is not specified by default for ade20k it will be 150 and for cityscapes it will be 19
model = kvmm.models.segformer.SegFormerB0(weights="ade20k", input_shape=(512,512,3))
model = kvmm.models.segformer.SegFormerB0(weights="cityscapes", input_shape=(512,512,3))

# Fine-Tune using `MiT` backbone (This will load `in1k` weights)
model = kvmm.models.segformer.SegFormerB0(weights="mit", input_shape=(512,512,3))

🚀 Custom Backbone Support

import kvmm

# With no backbone weights
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights=None, include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))

# With backbone weights
import kvmm
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights="tv_in1k", include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))

🚀 Example Inference

import kvmm
from PIL import Image
import numpy as np

model = kvmm.models.segformer.SegFormerB0(weights="ade20k_512")

image = Image.open("ADE_train_00000586.jpg")
processed_img = kvmm.models.segformer.SegFormerImageProcessor(image=image,
    do_resize=True,
    size={"height": 512, "width": 512},
    do_rescale=True,
    do_normalize=True)
outs = model.predict(processed_img)
outs = np.argmax(outs[0], axis=-1)
visualize_segmentation(outs, image)

output

VLMS 🚧

🛠️ Basic Usage

import keras

import kvmm

processor = kvmm.models.clip.CLIPProcessor()
model = kvmm.models.clip.ClipVitBase16(
    weights="openai_224",
    input_shape=(224, 224, 3), # You can fine-tune or infer with variable size 
)
inputs = processor(text=["mountains", "tortoise", "cat"], image_paths="cat1.jpg")
output = model(
    {
        "images": inputs["images"],
        "token_ids": inputs["input_ids"],
        "padding_mask": inputs["attention_mask"],
    }
)

print("Raw Model Output:")
print(output)

preds = keras.ops.softmax(output["image_logits"]).numpy().squeeze()
result = dict(zip(["mountains", "tortoise", "cat"], preds))
print("\nPrediction probabilities:")
print(result)

#output:
"""{'image_logits': <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[11.042501, 10.388493, 18.414747]], dtype=float32)>, 'text_logits': <tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[11.042501],
       [10.388493],
       [18.414747]], dtype=float32)>}

Prediction probabilities:
{'mountains': np.float32(0.0006278555), 'tortoise': np.float32(0.000326458), 'cat': np.float32(0.99904567)}"""

📑 Models



📜 License

This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.

🌟 Credits

  • The Keras team for their powerful and user-friendly deep learning framework
  • The Transformers library for its robust tools for loading and adapting pretrained models
  • The pytorch-image-models (timm) project for pioneering many computer vision model implementations
  • All contributors to the original papers and architectures implemented in this library

Citing

BibTeX

@misc{gc2025kvmm,
  author = {Gitesh Chawda},
  title = {Keras Vision Models},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/IMvision12/keras-vision-models}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvmm-0.1.7.tar.gz (237.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvmm-0.1.7-py3-none-any.whl (322.6 kB view details)

Uploaded Python 3

File details

Details for the file kvmm-0.1.7.tar.gz.

File metadata

  • Download URL: kvmm-0.1.7.tar.gz
  • Upload date:
  • Size: 237.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvmm-0.1.7.tar.gz
Algorithm Hash digest
SHA256 dfd0ad486e4dcfc011731d51116095a615f87aa9f5347146d4c1b5614a259389
MD5 662706af6cd88f3ed8912064daca91e5
BLAKE2b-256 7cbd9a6dc1756f4411d253285e178d1dda52120c6e160869c487301848fd5418

See more details on using hashes here.

Provenance

The following attestation bundles were made for kvmm-0.1.7.tar.gz:

Publisher: release.yml on IMvision12/keras-vision-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kvmm-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: kvmm-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 322.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvmm-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f979b0164899efd4aaf6313ede9692b285fe405e5acd359fb41b2b8c5e82cd
MD5 2923ee60e52a411c5af877a9a6392117
BLAKE2b-256 00fd95e077ca1b3b2c4848489c7ac4179adb312d79ad24a6dd9af00d183022cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for kvmm-0.1.7-py3-none-any.whl:

Publisher: release.yml on IMvision12/keras-vision-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page