KerasFormers: Open-source Keras 3 collection of pretrained models
Project description
KerasFormers ๐
๐ Introduction
KerasFormers is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RT-DETRv2, RF-DETR, D-FINE, OWL-ViT, OWLv2), segmentation (SAM, SAM2, SAM3, SegFormer, DeepLabV3, EoMT, MaskFormer, Mask2Former, MobileViT-DeepLabV3), monocular depth estimation (Depth Anything V1, Depth Anything V2), feature extraction (DINO, DINOv2, DINOv3), vision-language modeling (CLIP, SigLIP, SigLIP2, MetaCLIP 2), speech recognition (Whisper, Speech2Text), text generation with large language models (Qwen2, Qwen3, Qwen3.5), multimodal vision-language generation (Qwen2-VL, Qwen2.5-VL, Qwen3-VL), and more. It includes hybrid architectures like MaxViT alongside traditional CNNs and pure transformers. kerasformers includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.
โก Installation
From PyPI (recommended)
pip install -U kerasformers
From Source
pip install -U git+https://github.com/IMvision12/KerasFormers
๐ Documentation
Per-model guides - with architecture notes, usage examples, and available pretrained weights, live in the docs/ folder, one page per model across every supported task (classification, object detection, segmentation, depth estimation, feature extraction, vision-language, speech recognition, and language modeling). Classification backbones share a single page since they all follow the same XModel / XImageClassify two-class structure; each other model has its own. Browse docs/ for the complete, always-up-to-date list.
๐ Models
๐ Text Models
-
Text LLMs (text โ text)
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights Qwen2 Qwen2 Technical Report on-the-fly hf:Qwen3 Qwen3 Technical Report on-the-fly hf:Qwen3.5 Qwen3 Technical Report on-the-fly hf:
๐๏ธ Vision Models
-
Backbones
-
Object Detection
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights D-FINE D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement transformersDETR End-to-End Object Detection with Transformers transformersRT-DETR DETRs Beat YOLOs on Real-time Object Detection transformersRT-DETRv2 RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformers transformersRF-DETR RF-DETR: Neural Architecture Search for Real-Time Detection Transformers rfdetrOWL-ViT Simple Open-Vocabulary Object Detection with Vision Transformers transformersOWLv2 Scaling Open-Vocabulary Object Detection transformers
-
Segmentation
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights DeepLabV3 Rethinking Atrous Convolution for Semantic Image Segmentation torchvisionEoMT Your ViT is Secretly an Image Segmentation Model transformersMaskFormer Per-Pixel Classification is Not All You Need for Semantic Segmentation transformersMask2Former Masked-attention Mask Transformer for Universal Image Segmentation transformersMobileViT MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer transformersMobileViTV2 Separable Self-attention for Mobile Vision Transformers transformersSAM Segment Anything transformersSAM2 SAM 2: Segment Anything in Images and Videos transformersSAM3 SAM 3: Segment Anything with Concepts transformers(gated)SegFormer SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers transformers
-
Feature Extraction
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights DINO Emerging Properties in Self-Supervised Vision Transformers torch.hubDINOv2 DINOv2: Learning Robust Visual Features without Supervision transformersDINOv3 DINOv3: Self-Supervised Visual Representation Learning at Scale transformers(gated)
-
Depth Estimation
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights Depth Anything V1 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data transformersDepth Anything V2 Depth Anything V2 transformers
๐ผ๏ธ Multimodal Models
-
Vision-Language Encoders
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights CLIP Learning Transferable Visual Models From Natural Language Supervision transformersMetaCLIP 2 MetaCLIP 2: A Worldwide Scaling Recipe transformersSigLIP Sigmoid Loss for Language Image Pre-Training transformersSigLIP2 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features transformers
-
Multimodal LLMs (image + text โ text)
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights Qwen2-VL Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution on-the-fly hf:Qwen2.5-VL Qwen2.5-VL Technical Report on-the-fly hf:Qwen3-VL Qwen3 Technical Report on-the-fly hf:
๐ Audio Models
-
Speech (speech โ text)
๐ท๏ธ Model Name ๐ Reference Paper ๐ฆ Source of Weights Whisper Robust Speech Recognition via Large-Scale Weak Supervision transformersSpeech2Text fairseq S2T: Fast Speech-to-Text Modeling with fairseq transformers
๐ License
This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.
- ๐ kerasformers Code: This repository is licensed under the Apache 2.0 License.
๐ Credits
- The Keras team for their powerful and user-friendly deep learning framework
- The Transformers library for its robust tools for loading and adapting pretrained models
- The pytorch-image-models (timm) project for pioneering many computer vision model implementations
- All contributors to the original papers and architectures implemented in this library
Citing
BibTeX
@misc{gc2025kerasformers,
author = {Gitesh Chawda},
title = {KerasFormers},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/IMvision12/KerasFormers}}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kerasformers-1.0.1.tar.gz.
File metadata
- Download URL: kerasformers-1.0.1.tar.gz
- Upload date:
- Size: 773.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f081e91ad2cb3d5a5577937fed989c8c3402360de77c8cf995f22632ccbecf84
|
|
| MD5 |
f73b45d5991e63e5a6dacc5762194a74
|
|
| BLAKE2b-256 |
656a46aece7909a2812d76a81e6b363c52637183bae6fd6463fe86fce92a213d
|
Provenance
The following attestation bundles were made for kerasformers-1.0.1.tar.gz:
Publisher:
release.yml on IMvision12/KerasFormers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kerasformers-1.0.1.tar.gz -
Subject digest:
f081e91ad2cb3d5a5577937fed989c8c3402360de77c8cf995f22632ccbecf84 - Sigstore transparency entry: 1632137905
- Sigstore integration time:
-
Permalink:
IMvision12/KerasFormers@5ff4b36739cb1b0ac6573d09f777227b7a215440 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ff4b36739cb1b0ac6573d09f777227b7a215440 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kerasformers-1.0.1-py3-none-any.whl.
File metadata
- Download URL: kerasformers-1.0.1-py3-none-any.whl
- Upload date:
- Size: 948.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68faaf38b8df1d5ac4e03cf2b5ad016bcf57cfc52dfa7237234f61e473a02e08
|
|
| MD5 |
3440a5f180df82b57d6be55efd95d23a
|
|
| BLAKE2b-256 |
8d36e72827f9da684039adc9451668c7a0f271423251e68add95b824dd898fd5
|
Provenance
The following attestation bundles were made for kerasformers-1.0.1-py3-none-any.whl:
Publisher:
release.yml on IMvision12/KerasFormers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kerasformers-1.0.1-py3-none-any.whl -
Subject digest:
68faaf38b8df1d5ac4e03cf2b5ad016bcf57cfc52dfa7237234f61e473a02e08 - Sigstore transparency entry: 1632137933
- Sigstore integration time:
-
Permalink:
IMvision12/KerasFormers@5ff4b36739cb1b0ac6573d09f777227b7a215440 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IMvision12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ff4b36739cb1b0ac6573d09f777227b7a215440 -
Trigger Event:
push
-
Statement type: