Pretrained keras 3 vision models
Project description
KVMM: Keras Vision Models 🚀
📌 Table of Contents
- 📖 Introduction
- ⚡ Installation
- 🛠️ Usage
- 📑 Models
- 📜 License
- 🌟 Credits
📖 Introduction
Keras Vision Models (KVMM) is a collection of vision models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including segmentation, object detection, vision-language modeling (VLMs), and classification. KVMM includes custom layers and backbone support, providing flexibility and efficiency across various vision applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.
⚡Installation
From PyPI (recommended)
pip install -U kvmm
From Source
pip install -U git+https://github.com/IMvision12/keras-vision-models
🛠️ Usage
🔎 Listing Available Models
Shows all available models, including backbones, segmentation models, object detection models, and vision-language models (VLMs). It also includes the names of the weights available for each specific model variant.
import kvmm
print(kvmm.list_models())
## Output:
"""
CaiTM36 : fb_dist_in1k_384
CaiTM48 : fb_dist_in1k_448
CaiTS24 : fb_dist_in1k_224, fb_dist_in1k_384
...
ConvMixer1024D20 : in1k
ConvMixer1536D20 : in1k
...
ConvNeXtAtto : d2_in1k
ConvNeXtBase : fb_in1k, fb_in22k, fb_in22k_ft_in1k, fb_in22k_ft_in1k_384
...
"""
🔎 List Specific Model Variant
import kvmm
print(kvmm.list_models("swin"))
# Output:
"""
SwinBaseP4W12 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinBaseP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W12 : ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W7 : ms_in22k, ms_in22k_ft_in1k
SwinSmallP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinTinyP4W7 : ms_in1k, ms_in22k
"""
⚙️ Layers
KVMM provides various custom layers like StochasticDepth, LayerScale, EfficientMultiheadSelfAttention, and more. These layers can be seamlessly integrated into your custom models and workflows 🚀import kvmm
# Example 1
layer = kvmm.layers.StochasticDepth(drop_path_rate=0.1)
output = layer(input_tensor, training=True)
# Example 2
window_partition = WindowPartition(window_size=7)
windowed_features = window_partition(features, height=28, width=28)
🏗️ Backbone Usage (Classification)
🛠️ Basic Usage
import kvmm
import numpy as np
# default configuration
model = kvmm.models.vit.ViTTiny16()
# For Fine-Tuning (default weight)
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3))
# Custom Weight
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3), weights="augreg_in21k_224")
# Backbone Support
model = kvmm.models.vit.ViTTiny16(include_top=False, as_backbone=True, input_shape=(224,224,3), weights="augreg_in21k_224")
random_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
features = model(random_input)
print(f"Number of feature maps: {len(features)}")
for i, feature in enumerate(features):
print(f"Feature {i} shape: {feature.shape}")
"""
Output:
Number of feature maps: 13
Feature 0 shape: (1, 197, 192)
Feature 1 shape: (1, 197, 192)
Feature 2 shape: (1, 197, 192)
...
"""
Example Inference
from keras import ops
from keras.applications.imagenet_utils import decode_predictions
import kvmm
from PIL import Image
model = kvmm.models.swin.SwinTinyP4W7(input_shape=[224, 224, 3])
image = Image.open("bird.png").resize((224, 224))
x = ops.convert_to_tensor(image)
x = ops.expand_dims(x, axis=0)
# Predict
preds = model.predict(x)
print("Predicted:", decode_predictions(preds, top=3)[0])
#output:
Predicted: [('n01537544', 'indigo_bunting', np.float32(0.9135666)), ('n01806143', 'peacock', np.float32(0.0003379386)), ('n02017213', 'European_gallinule', np.float32(0.00027174334))]
🧩 Segmentation
🛠️ Basic Usage
import kvmm
# Pre-Trained weights (cityscapes or ade20kor mit(in1k))
# ade20k and cityscapes can be used for fine-tuning by giving custom `num_classes`
# If `num_classes` is not specified by default for ade20k it will be 150 and for cityscapes it will be 19
model = kvmm.models.segformer.SegFormerB0(weights="ade20k", input_shape=(512,512,3))
model = kvmm.models.segformer.SegFormerB0(weights="cityscapes", input_shape=(512,512,3))
# Fine-Tune using `MiT` backbone (This will load `in1k` weights)
model = kvmm.models.segformer.SegFormerB0(weights="mit", input_shape=(512,512,3))
🚀 Custom Backbone Support
import kvmm
# With no backbone weights
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights=None, include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))
# With backbone weights
import kvmm
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights="tv_in1k", include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))
🚀 Example Inference
import kvmm
from PIL import Image
import numpy as np
model = kvmm.models.segformer.SegFormerB0(weights="ade20k_512")
image = Image.open("ADE_train_00000586.jpg")
processed_img = kvmm.models.segformer.SegFormerImageProcessor(image=image,
do_resize=True,
size={"height": 512, "width": 512},
do_rescale=True,
do_normalize=True)
outs = model.predict(processed_img)
outs = np.argmax(outs[0], axis=-1)
visualize_segmentation(outs, image)
VLMS 🚧
🛠️ Basic Usage
import keras
import kvmm
processor = kvmm.models.clip.CLIPProcessor()
model = kvmm.models.clip.ClipVitBase16(
weights="openai_224",
input_shape=(224, 224, 3), # You can fine-tune or infer with variable size
)
inputs = processor(text=["mountains", "tortoise", "cat"], image_paths="cat1.jpg")
output = model(
{
"images": inputs["images"],
"token_ids": inputs["input_ids"],
"padding_mask": inputs["attention_mask"],
}
)
print("Raw Model Output:")
print(output)
preds = keras.ops.softmax(output["image_logits"]).numpy().squeeze()
result = dict(zip(["mountains", "tortoise", "cat"], preds))
print("\nPrediction probabilities:")
print(result)
#output:
"""{'image_logits': <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[11.042501, 10.388493, 18.414747]], dtype=float32)>, 'text_logits': <tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[11.042501],
[10.388493],
[18.414747]], dtype=float32)>}
Prediction probabilities:
{'mountains': np.float32(0.0006278555), 'tortoise': np.float32(0.000326458), 'cat': np.float32(0.99904567)}"""
📑 Models
-
Backbones:
-
Segmentation
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights SegFormer SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers transformers
-
Vision-Language-Models (VLMs)
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights CLIP Learning Transferable Visual Models From Natural Language Supervision transformersSigLIP Sigmoid Loss for Language Image Pre-Training transformersSigLIP2 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features transformers
📜 License
This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.
- 🔖 kvmm Code: This repository is licensed under the Apache 2.0 License.
🌟 Credits
- The Keras team for their powerful and user-friendly deep learning framework
- The Transformers library for its robust tools for loading and adapting pretrained models
- The pytorch-image-models (timm) project for pioneering many computer vision model implementations
- All contributors to the original papers and architectures implemented in this library
Citing
BibTeX
@misc{gc2025kvmm,
author = {Gitesh Chawda},
title = {Keras Vision Models},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/IMvision12/keras-vision-models}}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kvmm-0.1.6.tar.gz.
File metadata
- Download URL: kvmm-0.1.6.tar.gz
- Upload date:
- Size: 236.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f959408f1729c167a2559074aa5f32e4fc55aed5e4f0dbbb3b4a1fdd6e1ee11e
|
|
| MD5 |
c291fbefe8ad96d3f679bbcb23d3d55f
|
|
| BLAKE2b-256 |
4f5e5eda0d64922896a9b60bd51e568bdf44204ef06115f55a234118593dc1c9
|
Provenance
The following attestation bundles were made for kvmm-0.1.6.tar.gz:
Publisher:
release.yml on IMvision12/keras-vision-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kvmm-0.1.6.tar.gz -
Subject digest:
f959408f1729c167a2559074aa5f32e4fc55aed5e4f0dbbb3b4a1fdd6e1ee11e - Sigstore transparency entry: 246017597
- Sigstore integration time:
-
Permalink:
IMvision12/keras-vision-models@ea52313331f540cfedc5a1cb2894d09c5a47ef02 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ea52313331f540cfedc5a1cb2894d09c5a47ef02 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kvmm-0.1.6-py3-none-any.whl.
File metadata
- Download URL: kvmm-0.1.6-py3-none-any.whl
- Upload date:
- Size: 322.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f22f309105600f1ff016274505bf77032a1a03235788f63958c7507c0cf57414
|
|
| MD5 |
8a98b63048922b030e36c8fafe74359a
|
|
| BLAKE2b-256 |
94d9bfffd91ad6e20c3f5370bf8dea50f60b984c769d6113c8a390a9616ac48a
|
Provenance
The following attestation bundles were made for kvmm-0.1.6-py3-none-any.whl:
Publisher:
release.yml on IMvision12/keras-vision-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kvmm-0.1.6-py3-none-any.whl -
Subject digest:
f22f309105600f1ff016274505bf77032a1a03235788f63958c7507c0cf57414 - Sigstore transparency entry: 246017598
- Sigstore integration time:
-
Permalink:
IMvision12/keras-vision-models@ea52313331f540cfedc5a1cb2894d09c5a47ef02 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ea52313331f540cfedc5a1cb2894d09c5a47ef02 -
Trigger Event:
push
-
Statement type: