Pretrained keras 3 vision models

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

IMvision12

These details have not been verified by PyPI

Project description

KVMM: Keras Vision Models 🚀

Python

📌 Table of Contents

📖 Introduction
⚡ Installation
🛠️ Usage
📑 Models
📜 License
🌟 Credits

📖 Introduction

Keras Vision Models (KVMM) is a collection of vision models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including segmentation, object detection, vision-language modeling (VLMs), and classification. KVMM includes custom layers and backbone support, providing flexibility and efficiency across various vision applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.

⚡Installation

From PyPI (recommended)

pip install -U kvmm

From Source

pip install -U git+https://github.com/IMvision12/keras-vision-models

🛠️ Usage

🔎 Listing Available Models

Shows all available models, including backbones, segmentation models, object detection models, and vision-language models (VLMs). It also includes the names of the weights available for each specific model variant.

import kvmm
print(kvmm.list_models())

## Output:
"""
CaiTM36 : fb_dist_in1k_384
CaiTM48 : fb_dist_in1k_448
CaiTS24 : fb_dist_in1k_224, fb_dist_in1k_384
...
ConvMixer1024D20 : in1k
ConvMixer1536D20 : in1k
...
ConvNeXtAtto : d2_in1k
ConvNeXtBase : fb_in1k, fb_in22k, fb_in22k_ft_in1k, fb_in22k_ft_in1k_384
...
"""

🔎 List Specific Model Variant

import kvmm
print(kvmm.list_models("swin"))

# Output:
"""
SwinBaseP4W12 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinBaseP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W12 : ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W7 : ms_in22k, ms_in22k_ft_in1k
SwinSmallP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinTinyP4W7 : ms_in1k, ms_in22k
"""

⚙️ Layers

KVMM provides various custom layers like StochasticDepth, LayerScale, EfficientMultiheadSelfAttention, and more. These layers can be seamlessly integrated into your custom models and workflows 🚀

import kvmm

# Example 1
layer = kvmm.layers.StochasticDepth(drop_path_rate=0.1)
output = layer(input_tensor, training=True)

# Example 2
window_partition = WindowPartition(window_size=7)
windowed_features = window_partition(features, height=28, width=28)

🏗️ Backbone Usage (Classification)

🛠️ Basic Usage

import kvmm
import numpy as np

# default configuration
model = kvmm.models.vit.ViTTiny16()

# For Fine-Tuning (default weight)
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3))
# Custom Weight
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3), weights="augreg_in21k_224")

# Backbone Support
model = kvmm.models.vit.ViTTiny16(include_top=False, as_backbone=True, input_shape=(224,224,3), weights="augreg_in21k_224")
random_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
features = model(random_input)
print(f"Number of feature maps: {len(features)}")
for i, feature in enumerate(features):
    print(f"Feature {i} shape: {feature.shape}")

"""
Output:

Number of feature maps: 13
Feature 0 shape: (1, 197, 192)
Feature 1 shape: (1, 197, 192)
Feature 2 shape: (1, 197, 192)
...
"""

Example Inference

from keras import ops
from keras.applications.imagenet_utils import decode_predictions
import kvmm
from PIL import Image

model = kvmm.models.swin.SwinTinyP4W7(input_shape=[224, 224, 3])

image = Image.open("bird.png").resize((224, 224))
x = ops.convert_to_tensor(image)
x = ops.expand_dims(x, axis=0)

# Predict
preds = model.predict(x)
print("Predicted:", decode_predictions(preds, top=3)[0])

#output:
Predicted: [('n01537544', 'indigo_bunting', np.float32(0.9135666)), ('n01806143', 'peacock', np.float32(0.0003379386)), ('n02017213', 'European_gallinule', np.float32(0.00027174334))]

🧩 Segmentation

🛠️ Basic Usage

import kvmm

# Pre-Trained weights (cityscapes or ade20kor mit(in1k))
# ade20k and cityscapes can be used for fine-tuning by giving custom `num_classes`
# If `num_classes` is not specified by default for ade20k it will be 150 and for cityscapes it will be 19
model = kvmm.models.segformer.SegFormerB0(weights="ade20k", input_shape=(512,512,3))
model = kvmm.models.segformer.SegFormerB0(weights="cityscapes", input_shape=(512,512,3))

# Fine-Tune using `MiT` backbone (This will load `in1k` weights)
model = kvmm.models.segformer.SegFormerB0(weights="mit", input_shape=(512,512,3))

🚀 Custom Backbone Support

import kvmm

# With no backbone weights
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights=None, include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))

# With backbone weights
import kvmm
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights="tv_in1k", include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))

🚀 Example Inference

import kvmm
from PIL import Image
import numpy as np

model = kvmm.models.segformer.SegFormerB0(weights="ade20k_512")

image = Image.open("ADE_train_00000586.jpg")
processed_img = kvmm.models.segformer.SegFormerImageProcessor(image=image,
    do_resize=True,
    size={"height": 512, "width": 512},
    do_rescale=True,
    do_normalize=True)
outs = model.predict(processed_img)
outs = np.argmax(outs[0], axis=-1)
visualize_segmentation(outs, image)

output

VLMS 🚧

🛠️ Basic Usage

import keras

import kvmm

processor = kvmm.models.clip.CLIPProcessor()
model = kvmm.models.clip.ClipVitBase16(
    weights="openai_224",
    input_shape=(224, 224, 3), # You can fine-tune or infer with variable size 
)
inputs = processor(text=["mountains", "tortoise", "cat"], image_paths="cat1.jpg")
output = model(
    {
        "images": inputs["images"],
        "token_ids": inputs["input_ids"],
        "padding_mask": inputs["attention_mask"],
    }
)

print("Raw Model Output:")
print(output)

preds = keras.ops.softmax(output["image_logits"]).numpy().squeeze()
result = dict(zip(["mountains", "tortoise", "cat"], preds))
print("\nPrediction probabilities:")
print(result)

#output:
"""{'image_logits': <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[11.042501, 10.388493, 18.414747]], dtype=float32)>, 'text_logits': <tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[11.042501],
       [10.388493],
       [18.414747]], dtype=float32)>}

Prediction probabilities:
{'mountains': np.float32(0.0006278555), 'tortoise': np.float32(0.000326458), 'cat': np.float32(0.99904567)}"""

📑 Models

Backbones:

🏷️ Model Name	📜 Reference Paper	📦 Source of Weights
CaiT	Going deeper with Image Transformers	`timm`
ConvMixer	Patches Are All You Need?	`timm`
ConvNeXt	A ConvNet for the 2020s	`timm`
ConvNeXt V2	ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders	`timm`
DeiT	Training data-efficient image transformers & distillation through attention	`timm`
DenseNet	Densely Connected Convolutional Networks	`timm`
EfficientNet	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	`timm`
EfficientNet-Lite	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	`timm`
EfficientNetV2	EfficientNetV2: Smaller Models and Faster Training	`timm`
FlexiViT	FlexiViT: One Model for All Patch Sizes	`timm`
InceptionNeXt	InceptionNeXt: When Inception Meets ConvNeXt	`timm`
Inception-ResNet-v2	Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning	`timm`
Inception-v3	Rethinking the Inception Architecture for Computer Vision	`timm`
Inception-v4	Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning	`timm`
MiT	SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	`transformers`
MLP-Mixer	MLP-Mixer: An all-MLP Architecture for Vision	`timm`
MobileNetV2	MobileNetV2: Inverted Residuals and Linear Bottlenecks	`timm`
MobileNetV3	Searching for MobileNetV3	`keras`
MobileViT	MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	`timm`
MobileViTV2	Separable Self-attention for Mobile Vision Transformers	`timm`
PiT	Rethinking Spatial Dimensions of Vision Transformers	`timm`
PoolFormer	MetaFormer is Actually What You Need for Vision	`timm`
Res2Net	Res2Net: A New Multi-scale Backbone Architecture	`timm`
ResMLP	ResMLP: Feedforward networks for image classification with data-efficient training	`timm`
ResNet	Deep Residual Learning for Image Recognition	`timm`
ResNetV2	Identity Mappings in Deep Residual Networks	`timm`
ResNeXt	Aggregated Residual Transformations for Deep Neural Networks	`timm`
SENet	Squeeze-and-Excitation Networks	`timm`
Swin Transformer	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	`timm`
VGG	Very Deep Convolutional Networks for Large-Scale Image Recognition	`timm`
ViT	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	`timm`
Xception	Xception: Deep Learning with Depthwise Separable Convolutions	`keras`

Segmentation

🏷️ Model Name 📜 Reference Paper 📦 Source of Weights

SegFormer SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers transformers

🏷️ Model Name	📜 Reference Paper	📦 Source of Weights
SegFormer	SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	`transformers`

Vision-Language-Models (VLMs)

🏷️ Model Name	📜 Reference Paper	📦 Source of Weights
CLIP	Learning Transferable Visual Models From Natural Language Supervision	`transformers`
SigLIP	Sigmoid Loss for Language Image Pre-Training	`transformers`
SigLIP2	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features	`transformers`

📜 License

This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.

🔖 kvmm Code: This repository is licensed under the Apache 2.0 License.

🌟 Credits

The Keras team for their powerful and user-friendly deep learning framework
The Transformers library for its robust tools for loading and adapting pretrained models
The pytorch-image-models (timm) project for pioneering many computer vision model implementations
All contributors to the original papers and architectures implemented in this library

Citing

BibTeX

@misc{gc2025kvmm,
  author = {Gitesh Chawda},
  title = {Keras Vision Models},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/IMvision12/keras-vision-models}}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

IMvision12

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.8

Aug 4, 2025

This version

0.1.7

Jun 28, 2025

0.1.6

Jun 22, 2025

0.1.5

Jun 17, 2025

0.1.4

May 16, 2025

0.1.3

May 6, 2025

0.1.2

May 1, 2025

0.1.1

Apr 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvmm-0.1.7.tar.gz (237.1 kB view details)

Uploaded Jun 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kvmm-0.1.7-py3-none-any.whl (322.6 kB view details)

Uploaded Jun 28, 2025 Python 3

File details

Details for the file kvmm-0.1.7.tar.gz.

File metadata

Download URL: kvmm-0.1.7.tar.gz
Upload date: Jun 28, 2025
Size: 237.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvmm-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`dfd0ad486e4dcfc011731d51116095a615f87aa9f5347146d4c1b5614a259389`
MD5	`662706af6cd88f3ed8912064daca91e5`
BLAKE2b-256	`7cbd9a6dc1756f4411d253285e178d1dda52120c6e160869c487301848fd5418`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kvmm-0.1.7.tar.gz:

Publisher: release.yml on IMvision12/keras-vision-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kvmm-0.1.7.tar.gz
- Subject digest: dfd0ad486e4dcfc011731d51116095a615f87aa9f5347146d4c1b5614a259389
- Sigstore transparency entry: 254953774
- Sigstore integration time: Jun 28, 2025
Source repository:
- Permalink: IMvision12/keras-vision-models@80a9dd43681276501a0b4fa6709b7f64ca59ee2f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IMvision12
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@80a9dd43681276501a0b4fa6709b7f64ca59ee2f
- Trigger Event: push

File details

Details for the file kvmm-0.1.7-py3-none-any.whl.

File metadata

Download URL: kvmm-0.1.7-py3-none-any.whl
Upload date: Jun 28, 2025
Size: 322.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvmm-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0f979b0164899efd4aaf6313ede9692b285fe405e5acd359fb41b2b8c5e82cd`
MD5	`2923ee60e52a411c5af877a9a6392117`
BLAKE2b-256	`00fd95e077ca1b3b2c4848489c7ac4179adb312d79ad24a6dd9af00d183022cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kvmm-0.1.7-py3-none-any.whl:

Publisher: release.yml on IMvision12/keras-vision-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kvmm-0.1.7-py3-none-any.whl
- Subject digest: b0f979b0164899efd4aaf6313ede9692b285fe405e5acd359fb41b2b8c5e82cd
- Sigstore transparency entry: 254953786
- Sigstore integration time: Jun 28, 2025
Source repository:
- Permalink: IMvision12/keras-vision-models@80a9dd43681276501a0b4fa6709b7f64ca59ee2f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IMvision12
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@80a9dd43681276501a0b4fa6709b7f64ca59ee2f
- Trigger Event: push

kvmm 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

KVMM: Keras Vision Models 🚀

📌 Table of Contents

📖 Introduction

⚡Installation

🛠️ Usage

🔎 Listing Available Models

🔎 List Specific Model Variant

⚙️ Layers

🏗️ Backbone Usage (Classification)

🛠️ Basic Usage

Example Inference

🧩 Segmentation

🛠️ Basic Usage

🚀 Custom Backbone Support

🚀 Example Inference

VLMS 🚧

🛠️ Basic Usage

📑 Models

📜 License

🌟 Credits

Citing

BibTeX

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance