Pretrained keras 3 vision models
Project description
KerasFormers 🚀
📖 Introduction
KerasFormers is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RT-DETRv2, RF-DETR, D-FINE, OWL-ViT, OWLv2), segmentation (SAM, SAM2, SAM3, SegFormer, DeepLabV3, EoMT, MaskFormer, Mask2Former, MobileViT-DeepLabV3), monocular depth estimation (Depth Anything V1, Depth Anything V2), feature extraction (DINO, DINOv2, DINOv3), vision-language modeling (CLIP, SigLIP, SigLIP2, MetaCLIP 2), speech recognition (Whisper), and more. It includes hybrid architectures like MaxViT alongside traditional CNNs and pure transformers. kerasformers includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.
⚡ Installation
From PyPI (recommended)
pip install -U kerasformers
From Source
pip install -U git+https://github.com/IMvision12/KerasFormers
📑 Documentation
Per-model guides with architecture notes, usage examples, and available pretrained weights live in the docs/ folder. You'll find dedicated pages for classification backbones (CaiT, ViT, ResNet, ConvNeXt, EfficientNet, Swin, and the 30+ other backbones listed below — all share the same XModel / XImageClassify two-class structure), segmentation (SAM family, SegFormer, DeepLabV3, EoMT, MaskFormer, Mask2Former, MobileViT), object detection (DETR variants, D-FINE, OWL-ViT, OWLv2), feature extraction (DINO v1/v2/v3), depth estimation (Depth Anything v1/v2), vision-language models (CLIP, SigLIP, SigLIP2, MetaCLIP 2), and speech recognition (Whisper).
📑 Models
-
Backbones
-
Object Detection
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights D-FINE D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement transformersDETR End-to-End Object Detection with Transformers transformersRT-DETR DETRs Beat YOLOs on Real-time Object Detection transformersRT-DETRv2 RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformers transformersRF-DETR RF-DETR: Neural Architecture Search for Real-Time Detection Transformers rfdetrOWL-ViT Simple Open-Vocabulary Object Detection with Vision Transformers transformersOWLv2 Scaling Open-Vocabulary Object Detection transformers
-
Segmentation
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights DeepLabV3 Rethinking Atrous Convolution for Semantic Image Segmentation torchvisionEoMT Your ViT is Secretly an Image Segmentation Model transformersMaskFormer Per-Pixel Classification is Not All You Need for Semantic Segmentation transformersMask2Former Masked-attention Mask Transformer for Universal Image Segmentation transformersMobileViT MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer transformersMobileViTV2 Separable Self-attention for Mobile Vision Transformers transformersSAM Segment Anything transformersSAM2 SAM 2: Segment Anything in Images and Videos transformersSAM3 SAM 3: Segment Anything with Concepts transformers(gated)SegFormer SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers transformers
-
Feature Extraction
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights DINO Emerging Properties in Self-Supervised Vision Transformers torch.hubDINOv2 DINOv2: Learning Robust Visual Features without Supervision transformersDINOv3 DINOv3: Self-Supervised Visual Representation Learning at Scale transformers(gated)
-
Depth Estimation
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights Depth Anything V1 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data transformersDepth Anything V2 Depth Anything V2 transformers
-
Multimodal Models
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights CLIP Learning Transferable Visual Models From Natural Language Supervision transformersMetaCLIP 2 MetaCLIP 2: A Worldwide Scaling Recipe transformersSigLIP Sigmoid Loss for Language Image Pre-Training transformersSigLIP2 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features transformers
-
Speech
🏷️ Model Name 📜 Reference Paper 📦 Source of Weights Whisper Robust Speech Recognition via Large-Scale Weak Supervision transformers
📜 License
This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.
- 🔖 kerasformers Code: This repository is licensed under the Apache 2.0 License.
🌟 Credits
- The Keras team for their powerful and user-friendly deep learning framework
- The Transformers library for its robust tools for loading and adapting pretrained models
- The pytorch-image-models (timm) project for pioneering many computer vision model implementations
- All contributors to the original papers and architectures implemented in this library
Citing
BibTeX
@misc{gc2025kerasformers,
author = {Gitesh Chawda},
title = {KerasFormers},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/IMvision12/KerasFormers}}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kerasformers-1.0.0.tar.gz.
File metadata
- Download URL: kerasformers-1.0.0.tar.gz
- Upload date:
- Size: 689.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27decb0454880892213fd89b46ef86851eb694bfe1c66c63c8f5a877c20e4a27
|
|
| MD5 |
be56a37c153f25f8936ce8394942a916
|
|
| BLAKE2b-256 |
0f90af21ba0650c1f1b31cea6f17a9095e36d6e40d40106d4075b85fd0bf053a
|
Provenance
The following attestation bundles were made for kerasformers-1.0.0.tar.gz:
Publisher:
release.yml on IMvision12/KerasFormers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kerasformers-1.0.0.tar.gz -
Subject digest:
27decb0454880892213fd89b46ef86851eb694bfe1c66c63c8f5a877c20e4a27 - Sigstore transparency entry: 1601832801
- Sigstore integration time:
-
Permalink:
IMvision12/KerasFormers@57ac3bb470b7a03390ff2a2a711fce87c51f7f35 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@57ac3bb470b7a03390ff2a2a711fce87c51f7f35 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kerasformers-1.0.0-py3-none-any.whl.
File metadata
- Download URL: kerasformers-1.0.0-py3-none-any.whl
- Upload date:
- Size: 841.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bb9375318b57559d86d015a9d84433d413f1bc8d5087fd8b45844134ff309ff
|
|
| MD5 |
392b42702bce42556b7530927bf99454
|
|
| BLAKE2b-256 |
9447eb14fdbe1046dfb6e903cdba55b9cf1cf6700716df6214f1b9e932dcf462
|
Provenance
The following attestation bundles were made for kerasformers-1.0.0-py3-none-any.whl:
Publisher:
release.yml on IMvision12/KerasFormers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kerasformers-1.0.0-py3-none-any.whl -
Subject digest:
4bb9375318b57559d86d015a9d84433d413f1bc8d5087fd8b45844134ff309ff - Sigstore transparency entry: 1601832812
- Sigstore integration time:
-
Permalink:
IMvision12/KerasFormers@57ac3bb470b7a03390ff2a2a711fce87c51f7f35 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@57ac3bb470b7a03390ff2a2a711fce87c51f7f35 -
Trigger Event:
push
-
Statement type: