Vision Transformers Zoo
Project description
A clean, extensible factory for creating HuggingFace-based Vision Transformer models (ViT, DeiT, DINO, DINOv2, DINOv3, CLIP) with flexible heads and easy backbone freezing.
Installation
pip install vit_zoo
From source:
git clone https://github.com/jbindaAI/vit_zoo.git
cd vit_zoo
pip install -e .
For development: pip install -e ".[dev]"
Quick start
from vit_zoo.factory import build_model
model = build_model("dinov2_vit", head=10, freeze_backbone=True)
logits = model(images) # (batch_size, 10)
Basic usage
from vit_zoo.factory import build_model
# Simple classification
model = build_model("vanilla_vit", head=10, freeze_backbone=True)
predictions = model(images) # Shape: (batch_size, 10)
Custom MLP Head
from vit_zoo.factory import build_model
from vit_zoo.components import MLPHead
mlp_head = MLPHead(
input_dim=768,
hidden_dims=[512, 256],
output_dim=100,
dropout=0.1,
activation="gelu" # or 'relu', 'tanh', or nn.Module
)
model = build_model("dinov2_vit", head=mlp_head)
Embedding Extraction
model = build_model("clip_vit", head=None)
outputs = model(images, output_embeddings=True)
embeddings = outputs["embeddings"] # Shape: (batch_size, seq_len, embedding_dim)
cls_embedding = embeddings[:, 0, :] # Shape: (batch_size, embedding_dim)
Attention Weights
model = build_model(
"vanilla_vit",
head=10,
config_kwargs={"attn_implementation": "eager"}
)
outputs = model(images, output_attentions=True)
attentions = outputs["attentions"]
Custom Head
from vit_zoo.components import BaseHead
import torch.nn as nn
class CustomHead(BaseHead):
def __init__(self, input_dim: int, num_classes: int):
super().__init__()
self._input_dim = input_dim
self.fc = nn.Linear(input_dim, num_classes)
@property
def input_dim(self) -> int:
return self._input_dim
def forward(self, embeddings):
return self.fc(embeddings)
head = CustomHead(input_dim=768, num_classes=10)
model = build_model("vanilla_vit", head=head)
Direct Usage (Any HuggingFace Model)
from vit_zoo.factory import build_model
from transformers import ViTModel
model = build_model(
model_name="google/vit-large-patch16-224",
backbone_cls=ViTModel,
head=10
)
API Reference
build_model()
build_model(
model_type: Optional[str] = None,
model_name: Optional[str] = None,
backbone_cls: Optional[Type[ViTBackboneProtocol]] = None,
head: Optional[Union[int, BaseHead]] = None,
freeze_backbone: bool = False,
load_pretrained: bool = True,
backbone_dropout: float = 0.0,
config_kwargs: Optional[Dict[str, Any]] = None,
) -> ViTModel
Parameters:
model_type: Registry key ("vanilla_vit","deit_vit","dinov2_vit", etc.)head:int(creates LinearHead),BaseHeadinstance, orNone(embedding extraction)freeze_backbone: Freeze all backbone parametersconfig_kwargs: Extra config options (e.g.,{"attn_implementation": "eager"})
Usage:
- Registry:
build_model("vanilla_vit", head=10) - Override:
build_model("vanilla_vit", model_name="google/vit-large-patch16-224", head=10) - Direct:
build_model(model_name="...", backbone_cls=ViTModel, head=10)
ViTModel.forward()
forward(
pixel_values: torch.Tensor,
output_attentions: bool = False,
output_embeddings: bool = False,
) -> Union[torch.Tensor, Dict[str, Any]]
Returns predictions tensor, or dict with "predictions", "attentions", "embeddings" keys.
ViTModel.freeze_backbone()
model.freeze_backbone(freeze: bool = True) # Freeze/unfreeze backbone
list_models()
from vit_zoo.factory import list_models
available = list_models() # Returns list of registered model types
Default Models
vanilla_vit: Google ViT (google/vit-base-patch16-224)deit_vit: Facebook DeiT (facebook/deit-base-distilled-patch16-224)dino_vit: Facebook DINO (facebook/dino-vitb16)dinov2_vit: Facebook DINOv2 (facebook/dinov2-base)dinov2_reg_vit: DINOv2 with registers (facebook/dinov2-with-registers-base)dinov3_vit: Facebook DINOv3 (facebook/dinov3-vitb16-pretrain-lvd1689m)clip_vit: OpenAI CLIP Vision (openai/clip-vit-base-patch16)
Import Patterns
from vit_zoo import ViTModel
from vit_zoo.factory import build_model, list_models
from vit_zoo.components import ViTBackbone, BaseHead, LinearHead, MLPHead, IdentityHead
Available Heads
LinearHead: Simple linear layer (auto-created whenhead=int)MLPHead: Multi-layer perceptron with configurable depth, activation, dropoutIdentityHead: Returns embeddings unchanged
All heads must implement input_dim property. Custom heads by subclassing BaseHead.
License
GPL-3.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vit_zoo-0.1.4.tar.gz.
File metadata
- Download URL: vit_zoo-0.1.4.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f8f36bc8f9178851b1e47ac3130c85a117eea9733439c6116a64bbbfca750e7
|
|
| MD5 |
961056dc6aec65c642b92b41913fb608
|
|
| BLAKE2b-256 |
037f10244882721c5d9193041f01389846064b2358e1351e2a640f1ec95a0b48
|
Provenance
The following attestation bundles were made for vit_zoo-0.1.4.tar.gz:
Publisher:
release.yml on jbindaAI/vit_zoo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vit_zoo-0.1.4.tar.gz -
Subject digest:
2f8f36bc8f9178851b1e47ac3130c85a117eea9733439c6116a64bbbfca750e7 - Sigstore transparency entry: 836549494
- Sigstore integration time:
-
Permalink:
jbindaAI/vit_zoo@35536a79f8cc5f9323e55ba749da6c10d86e4323 -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/jbindaAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@35536a79f8cc5f9323e55ba749da6c10d86e4323 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vit_zoo-0.1.4-py3-none-any.whl.
File metadata
- Download URL: vit_zoo-0.1.4-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c38410b8b58a3e722cefd1f405e11c60bff9012e829914c6d423d06de434261
|
|
| MD5 |
56399715357e3fd7bae4c1b1bd8d83ac
|
|
| BLAKE2b-256 |
36d03aff3564eb3ba3641645b415d8696024897a4e516b0abe10b564c7154865
|
Provenance
The following attestation bundles were made for vit_zoo-0.1.4-py3-none-any.whl:
Publisher:
release.yml on jbindaAI/vit_zoo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vit_zoo-0.1.4-py3-none-any.whl -
Subject digest:
4c38410b8b58a3e722cefd1f405e11c60bff9012e829914c6d423d06de434261 - Sigstore transparency entry: 836549500
- Sigstore integration time:
-
Permalink:
jbindaAI/vit_zoo@35536a79f8cc5f9323e55ba749da6c10d86e4323 -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/jbindaAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@35536a79f8cc5f9323e55ba749da6c10d86e4323 -
Trigger Event:
push
-
Statement type: