vit-zoo

Vision Transformers Zoo

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jbindaAI

These details have not been verified by PyPI

Project description

vit_zoo logo

A clean, extensible factory for creating HuggingFace-based Vision Transformer models (ViT, DeiT, DINO, DINOv2, DINOv3, CLIP) with flexible heads and easy backbone freezing.

Installation

pip install vit_zoo

From source:

git clone https://github.com/jbindaAI/vit_zoo.git
cd vit_zoo
pip install -e .

For development: pip install -e ".[dev]"

Quick start

from vit_zoo.factory import build_model

model = build_model("dinov2_vit", head=10, freeze_backbone=True)
logits = model(images)  # (batch_size, 10)

Basic usage

from vit_zoo.factory import build_model

# Simple classification
model = build_model("vanilla_vit", head=10, freeze_backbone=True)
predictions = model(images)  # Shape: (batch_size, 10)

Custom MLP Head

from vit_zoo.factory import build_model
from vit_zoo.components import MLPHead

mlp_head = MLPHead(
    input_dim=768,
    hidden_dims=[512, 256],
    output_dim=100,
    dropout=0.1,
    activation="gelu"  # or 'relu', 'tanh', or nn.Module
)

model = build_model("dinov2_vit", head=mlp_head)

Embedding Extraction

model = build_model("clip_vit", head=None)
outputs = model(images, output_embeddings=True)
embeddings = outputs["embeddings"]  # Shape: (batch_size, seq_len, embedding_dim)
cls_embedding = embeddings[:, 0, :]  # Shape: (batch_size, embedding_dim)

Attention Weights

model = build_model(
    "vanilla_vit",
    head=10,
    config_kwargs={"attn_implementation": "eager"}
)
outputs = model(images, output_attentions=True)
attentions = outputs["attentions"]

Custom Head

from vit_zoo.components import BaseHead
import torch.nn as nn

class CustomHead(BaseHead):
    def __init__(self, input_dim: int, num_classes: int):
        super().__init__()
        self._input_dim = input_dim
        self.fc = nn.Linear(input_dim, num_classes)
    
    @property
    def input_dim(self) -> int:
        return self._input_dim
    
    def forward(self, embeddings):
        return self.fc(embeddings)

head = CustomHead(input_dim=768, num_classes=10)
model = build_model("vanilla_vit", head=head)

Direct Usage (Any HuggingFace Model)

from vit_zoo.factory import build_model
from transformers import ViTModel

model = build_model(
    model_name="google/vit-large-patch16-224",
    backbone_cls=ViTModel,
    head=10
)

API Reference

`build_model()`

build_model(
    model_type: Optional[str] = None,
    model_name: Optional[str] = None,
    backbone_cls: Optional[Type[ViTBackboneProtocol]] = None,
    head: Optional[Union[int, BaseHead]] = None,
    freeze_backbone: bool = False,
    load_pretrained: bool = True,
    backbone_dropout: float = 0.0,
    config_kwargs: Optional[Dict[str, Any]] = None,
) -> ViTModel

Parameters:

model_type: Registry key ("vanilla_vit", "deit_vit", "dinov2_vit", etc.)
head: int (creates LinearHead), BaseHead instance, or None (embedding extraction)
freeze_backbone: Freeze all backbone parameters
config_kwargs: Extra config options (e.g., {"attn_implementation": "eager"})

Usage:

Registry: build_model("vanilla_vit", head=10)
Override: build_model("vanilla_vit", model_name="google/vit-large-patch16-224", head=10)
Direct: build_model(model_name="...", backbone_cls=ViTModel, head=10)

`ViTModel.forward()`

forward(
    pixel_values: torch.Tensor,
    output_attentions: bool = False,
    output_embeddings: bool = False,
) -> Union[torch.Tensor, Dict[str, Any]]

Returns predictions tensor, or dict with "predictions", "attentions", "embeddings" keys.

`ViTModel.freeze_backbone()`

model.freeze_backbone(freeze: bool = True)  # Freeze/unfreeze backbone

`list_models()`

from vit_zoo.factory import list_models
available = list_models()  # Returns list of registered model types

Supported Models

vanilla_vit: Google ViT (google/vit-base-patch16-224)
deit_vit: Facebook DeiT (facebook/deit-base-distilled-patch16-224)
dino_vit: Facebook DINO (facebook/dino-vitb16)
dinov2_vit: Facebook DINOv2 (facebook/dinov2-base)
dinov2_reg_vit: DINOv2 with registers (facebook/dinov2-with-registers-base)
dinov3_vit: Facebook DINOv3 (facebook/dinov3-vitb16-pretrain-lvd1689m)
clip_vit: OpenAI CLIP Vision (openai/clip-vit-base-patch16)

Import Patterns

from vit_zoo import ViTModel
from vit_zoo.factory import build_model, list_models
from vit_zoo.components import ViTBackbone, BaseHead, LinearHead, MLPHead, IdentityHead

Available Heads

LinearHead: Simple linear layer (auto-created when head=int)
MLPHead: Multi-layer perceptron with configurable depth, activation, dropout
IdentityHead: Returns embeddings unchanged

All heads must implement input_dim property. Custom heads by subclassing BaseHead.

License

GPL-3.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jbindaAI

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Feb 16, 2026

0.2.0

Feb 10, 2026

0.1.5

Feb 10, 2026

0.1.4

Jan 20, 2026

This version

0.1.3

Jan 20, 2026

0.1.2

Jan 16, 2026

0.1.1

Jan 16, 2026

0.1.0

Jan 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vit_zoo-0.1.3.tar.gz (13.1 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vit_zoo-0.1.3-py3-none-any.whl (13.2 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file vit_zoo-0.1.3.tar.gz.

File metadata

Download URL: vit_zoo-0.1.3.tar.gz
Upload date: Jan 20, 2026
Size: 13.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vit_zoo-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f1f909a7d2eadb640b5b5f2898777a3291a92015cdf64f63e4c3fba86b567ac3`
MD5	`2644bbdda21ac99339b8c1249301eb87`
BLAKE2b-256	`2e21fb96111025100aafdcb64c4acbc53b0455eac8235e2a0eb44a49d211ef6f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vit_zoo-0.1.3.tar.gz:

Publisher: release.yml on jbindaAI/vit_zoo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vit_zoo-0.1.3.tar.gz
- Subject digest: f1f909a7d2eadb640b5b5f2898777a3291a92015cdf64f63e4c3fba86b567ac3
- Sigstore transparency entry: 836530471
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: jbindaAI/vit_zoo@53fc2d976c8670b5134472b93533c77ef2b3777d
- Branch / Tag: refs/tags/0.1.3
- Owner: https://github.com/jbindaAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@53fc2d976c8670b5134472b93533c77ef2b3777d
- Trigger Event: push

File details

Details for the file vit_zoo-0.1.3-py3-none-any.whl.

File metadata

Download URL: vit_zoo-0.1.3-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vit_zoo-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d5b2dfbb7723d63609b6bf036083435e286812a12592f86b3d79b5841cb8d65`
MD5	`7b5fadfca293aaafd192db4a21f390be`
BLAKE2b-256	`64cf2bbd852c911074656733958cb5273758e75cd54433129ff0ce0bb3aedf08`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vit_zoo-0.1.3-py3-none-any.whl:

Publisher: release.yml on jbindaAI/vit_zoo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vit_zoo-0.1.3-py3-none-any.whl
- Subject digest: 1d5b2dfbb7723d63609b6bf036083435e286812a12592f86b3d79b5841cb8d65
- Sigstore transparency entry: 836530476
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: jbindaAI/vit_zoo@53fc2d976c8670b5134472b93533c77ef2b3777d
- Branch / Tag: refs/tags/0.1.3
- Owner: https://github.com/jbindaAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@53fc2d976c8670b5134472b93533c77ef2b3777d
- Trigger Event: push

vit-zoo 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Quick start

Basic usage

Custom MLP Head

Embedding Extraction

Attention Weights

Custom Head

Direct Usage (Any HuggingFace Model)

API Reference

build_model()

ViTModel.forward()

ViTModel.freeze_backbone()

list_models()

Supported Models

Import Patterns

Available Heads

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`build_model()`

`ViTModel.forward()`

`ViTModel.freeze_backbone()`

`list_models()`