Implementation of popular vision models in Jax

These details have not been verified by PyPI

Project description

Equimo: Modern Vision Models in JAX/Equinox

WARNING: This is a research library implementing recent computer vision models. The implementations are based on paper descriptions and may not be exact replicas of the original implementations. Use with caution in production environments.

Equimo (Equinox Image Models) provides JAX/Equinox implementations of recent computer vision models, currently focusing (but not limited to) on transformer and state-space architectures.

Features

Pure JAX/Equinox implementations
Focus on recent architectures (2023-2024 papers)
Modular design for easy experimentation
Extensive documentation and type hints
Experimental support for text embedding

Installation

From PyPI

pip install equimo

From Source

git clone https://github.com/clementpoiret/equimo.git
cd equimo
pip install -e .

Implemented Models

Beyond normal ViT (e.g., dinov2 or siglip), equimo proposes other SotA architectures:

Model	Paper	Year	Status
FasterViT	FasterViT: Fast Vision Transformers with Hierarchical Attention	2023	✅
Castling-ViT	Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference	2023	Partial*
MLLA	Mamba-like Linear Attention	2024	✅
PartialFormer	Efficient Vision Transformers with Partial Attention	2024	✅
SHViT	SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design	2024	✅
VSSD	VSSD: Vision Mamba with Non-Causal State Space Duality	2024	✅
ReduceFormer	ReduceFormer: Attention with Tensor Reduction by Summation	2024	✅
LowFormer	LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones	2024	✅

*: Only contains the Linear Angular Attention module. It is straight forward to build a ViT around it, but may require an additional __call__ kwarg to control the sparse_reg bool.

Basic Usage

import jax

import equimo.models as em

# Create a model (e.g. `faster_vit_0_224`)
key = jax.random.PRNGKey(0)
model = em.FasterViT(
    img_size=224,
    in_channels=3,
    dim=64,
    in_dim=64,
    depths=[2, 3, 6, 5],
    num_heads=[2, 4, 8, 16],
    hat=[False, False, True, False],
    window_size=[7, 7, 7, 7],
    ct_size=2,
    key=key,
)

# Generate random input
x = jax.random.normal(key, (3, 224, 224))

# Run inference
output = model(x, enable_dropout=False, key=key)

Working with text embeddings

Warning: this is experimental, it can break or change at any time

equimo.experimental.text has been added since v0.3.0. It allows working with both text and images. It is especially useful for models like SigLIP or TIPS, although only TIPS is currently supported.

Currently, text tokenization relies on tensorflow_text, install equimo with the text group such as uv add equimo[text].

Here is a very simple example of a 0-shot classification based on the comparison between text and image embeddings:

import jax
from einops import rearrange

from equimo.experimental.text import Tokenizer
from equimo.io import load_image, load_model
from equimo.utils import PCAVisualizer, normalize, plot_image_and_feature_map

# Random demo inputs
key = jax.random.PRNGKey(42)
image = load_image("./demo.jpg", size=448)
text = [
    "A baby discovering happiness",
    "A computer",
]

# Loading pretrained models
image_encoder = load_model("vit", "tips_vits14_hr")
text_encoder = load_model("experimental.textencoder", "tips_vits14_hr_text")

# Encoding text and image
ids, paddings = Tokenizer(identifier="sentencepiece_tips").tokenize(text, max_len=64)

text_embedding = normalize(
    jax.vmap(text_encoder, in_axes=(0, 0, None))(ids, paddings, key)
)
image_embedding = jax.vmap(image_encoder.norm)(image_encoder.features(image, key))
cls_token = normalize(image_embedding[0])
spatial_features = rearrange(
    image_embedding[2:], "(h w) d -> h w d", h=int(448 / 14), w=int(448 / 14)
)

# Getting probabilities based on Cosine Similarity
cos_sim = jax.nn.softmax(
    ((cls_token[None, :] @ text_embedding.T) / text_encoder.temperature), axis=-1
)

# Plot the results
label_idxs = jax.numpy.argmax(cos_sim, axis=-1)
cos_sim_max = jax.numpy.max(cos_sim, axis=-1)
label_predicted = text[label_idxs[0]]
similarity = cos_sim_max[0]
pca_obj = PCAVisualizer(spatial_features)
image_pca = pca_obj(spatial_features)

plot_image_and_feature_map(
    image.transpose(1, 2, 0),
    image_pca,
    "./out.png",
    "Input Image",
    f"{label_predicted}, prob: {similarity * 100:.2f}%",
)

Resulting in such a wonderful result:

Output of TIPS 0-shot classification

Saving and Loading Models

Equimo provides utilities for saving models locally and loading pre-trained models from the official repository.

Saving Models Locally

from pathlib import Path
from equimo.io import save_model

# Save model with compression (creates .tar.lz4 file)
save_model(
    Path("path/to/save/model"),
    model,  # can be any model you created using Equimo
    model_config,
    torch_hub_cfg,  # This can be an empty list, it's mainly to keep track of where are the weights coming
    compression=True
)

# Save model without compression (creates directory)
save_model(
    Path("path/to/save/model"),
    model,
    model_config,
    torch_hub_cfg,
    compression=False
)

Loading Models

from equimo.io import load_model

# Load a pre-trained model from the official repository
model = load_model(cls="vit", identifier="dinov2_vits14_reg")

# Load a local model (compressed)
model = load_model(cls="vit", path=Path("path/to/model.tar.lz4"))

# Load a local model (uncompressed directory)
model = load_model(cls="vit", path=Path("path/to/model/"))

Parameters passed to models can be overridden such as:

model = load_model(
    cls="vit",
    identifier="siglip2_vitb16_256",
    dynamic_img_size=True,  # passed to the VisionTransformer class
)

List of pretrained models

The following models have pretrained weights available in Equimo:

Model identifiers allow downloading from equimo's repository on huggingface

Identifiers are filenames without the extensions, such as:

dinov2_vitb14
dinov2_vits14_reg
siglip2_vitl16_512
siglip2_vitso400m16_384
tips_vitg14_lr

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Equimo in your research, please cite:

@software{equimo2024,
  author = {Clément POIRET},
  title = {Equimo: Modern Vision Models in JAX/Equinox},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/clementpoiret/equimo}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.1a6 pre-release

Apr 27, 2026

1.1.1a5 pre-release

Apr 24, 2026

1.1.1a4 pre-release

Apr 24, 2026

1.1.1a3 pre-release

Apr 24, 2026

1.1.1a2 pre-release

Apr 23, 2026

1.1.1a1 pre-release

Apr 23, 2026

1.1.0

Apr 4, 2026

1.0.1

Apr 1, 2026

1.0.0

Mar 31, 2026

0.5.0a18 pre-release

Jan 15, 2026

0.5.0a17 pre-release

Jan 15, 2026

0.5.0a16 pre-release

Jan 14, 2026

0.5.0a15 pre-release

Jan 14, 2026

0.5.0a14 pre-release

Jan 12, 2026

0.5.0a13 pre-release

Jan 12, 2026

0.5.0a12 pre-release

Jan 11, 2026

0.5.0a11 pre-release

Jan 11, 2026

0.5.0a10 pre-release

Dec 18, 2025

0.5.0a9 pre-release

Dec 17, 2025

0.5.0a8 pre-release

Dec 17, 2025

0.5.0a7 pre-release

Dec 17, 2025

0.5.0a6 pre-release

Dec 17, 2025

0.5.0a5 pre-release

Dec 17, 2025

0.5.0a4 pre-release

Nov 13, 2025

0.5.0a3 pre-release

Nov 13, 2025

0.5.0a2 pre-release

Nov 5, 2025

0.5.0a1 pre-release

Nov 5, 2025

0.4.7

Oct 16, 2025

0.4.6a5 pre-release

Sep 22, 2025

0.4.6a4 pre-release

Sep 22, 2025

0.4.6a3 pre-release

Sep 22, 2025

0.4.6a2 pre-release

Sep 15, 2025

0.4.6a1 pre-release

Sep 15, 2025

0.4.5

Sep 11, 2025

0.4.4

Sep 10, 2025

0.4.3

Sep 9, 2025

0.4.2

Sep 9, 2025

0.4.1

Aug 22, 2025

0.4.0a15 pre-release

Aug 14, 2025

0.4.0a14 pre-release

Aug 14, 2025

0.4.0a13 pre-release

Aug 8, 2025

0.4.0a12 pre-release

Aug 8, 2025

0.4.0a11 pre-release

Aug 8, 2025

0.4.0a10 pre-release

Jul 16, 2025

This version

0.4.0a9 pre-release

Jul 16, 2025

0.4.0a8 pre-release

Jul 16, 2025

0.4.0a7 pre-release

May 28, 2025

0.4.0a6 pre-release

May 7, 2025

0.4.0a5 pre-release

Apr 30, 2025

0.4.0a4 pre-release

Apr 30, 2025

0.4.0a3 pre-release

Apr 30, 2025

0.4.0a2 pre-release

Apr 29, 2025

0.4.0a1 pre-release

Apr 23, 2025

0.3.5

Apr 9, 2025

0.3.4

Apr 9, 2025

0.3.3

Mar 26, 2025

0.3.2

Mar 26, 2025

0.3.0

Mar 14, 2025

0.2.4

Mar 11, 2025

0.2.3

Feb 24, 2025

0.2.2

Feb 19, 2025

0.2.1

Feb 19, 2025

0.2.0

Feb 18, 2025

0.1.3a9 pre-release

Jan 9, 2025

0.1.3a8 pre-release

Jan 8, 2025

0.1.3a7 pre-release

Jan 3, 2025

0.1.3a6 pre-release

Dec 16, 2024

0.1.3a5 pre-release

Dec 12, 2024

0.1.3a4 pre-release

Dec 12, 2024

0.1.3a2 pre-release

Dec 11, 2024

0.1.3a1 pre-release

Dec 11, 2024

0.1.2

Dec 10, 2024

0.1.1

Dec 10, 2024

0.1.0

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

equimo-0.4.0a9.tar.gz (65.0 kB view details)

Uploaded Jul 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

equimo-0.4.0a9-py3-none-any.whl (82.7 kB view details)

Uploaded Jul 16, 2025 Python 3

File details

Details for the file equimo-0.4.0a9.tar.gz.

File metadata

Download URL: equimo-0.4.0a9.tar.gz
Upload date: Jul 16, 2025
Size: 65.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for equimo-0.4.0a9.tar.gz
Algorithm	Hash digest
SHA256	`15a5078512c9c8a3adeeb2a1a69df9cbb7d9e45d028788526d9a7ee902ffd8d7`
MD5	`929650efeb4cdc5c2d81ec58e05b4147`
BLAKE2b-256	`fea765d3b1ec478864d2bd5a0b4527cf238ea251b8a312e8f0ce36870f998105`

See more details on using hashes here.

File details

Details for the file equimo-0.4.0a9-py3-none-any.whl.

File metadata

Download URL: equimo-0.4.0a9-py3-none-any.whl
Upload date: Jul 16, 2025
Size: 82.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for equimo-0.4.0a9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`63985ebb0821f2b68b7dd6afbd46d2edbf8cf14fe638c7d54ad1afeeff1b3d88`
MD5	`7fd0ffee3980a3327280db2a77e064a3`
BLAKE2b-256	`f981cf42e35f70a3afebb2dcf9308cd5809c57962398c3a9c4e2c52e5f4f733f`

See more details on using hashes here.

Equimo 0.4.0a9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Equimo: Modern Vision Models in JAX/Equinox

Features

Installation

From PyPI

From Source

Implemented Models

Basic Usage

Working with text embeddings

Saving and Loading Models

Saving Models Locally

Loading Models

List of pretrained models

Contributing

License

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes