Pocket-Sized Multimodal AI for Content Understanding and Generation

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

unum

These details have not been verified by PyPI

Project description

UForm

Pocket-Sized Multimodal AI
For Content Understanding and Generation

Multimodal Embeddings from 64 to 768 Dimensions • 1B Parameter Chat
Short Texts • Images • 🔜 Video Clips • 🔜 Long Documents
ONNX • CoreML • PyTorch
Python • JavaScript • Swift

UForm Chat Preview

Welcome to UForm, a multimodal AI library that's as versatile as it is efficient. UForm tiny embedding models will help you understand and search visual and textual content across various languages. UForm small generative models, on the other hand, don't only support conversational and chat use-cases, but are great for fast image captioning and Visual Question Answering (VQA). With compact custom pre-trained transformer models, this can run anywhere from your server farm down to your smartphone.

Features

Tiny Embeddings: 64-dimensional Matryoshka-style embeddings for extremely fast search.
Throughput: Thanks to the small size, the inference speed is 2-4x faster than competitors.
Portable: Models come with native ONNX support, making them easy to deploy on any platform.
Quantization Aware: Down-cast embeddings from f32 to i8 without losing much recall.
Multilingual: Trained on a balanced dataset, the recall is great across over 20 languages.

Models

For accuracy and speed benchmarks refer to the evaluation page.

Embedding Models

Model	Parameters	Languages	Architecture
`uform3-image-text-english-large` 🆕	365 M	1	12 layer BERT, ViT-L/14
`uform3-image-text-english-base`	143 M	1	4 layer BERT, ViT-B/16
`uform3-image-text-english-small` 🆕	79 M	1	4 layer BERT, ViT-S/16
`uform3-image-text-multilingual-base`	206M	21	12 layer BERT, ViT-B/16

Generative Models

Model	Parameters	Purpose	Architecture
`uform-gen2-dpo` 🆕	1.2 B	Chat, Image Captioning, VQA	qwen1.5-0.5B, ViT-H/14
`uform-gen2-qwen-500m`	1.2 B	Chat, Image Captioning, VQA	qwen1.5-0.5B, ViT-H/14
`uform-gen` ⚠️	1.5 B	Image Captioning, VQA	llama-1.3B, ViT-B/16

Quick Start Examples

Embedding Models

First, pip install uform. Then, load the model:

from uform import get_model, Modality

processors, models = get_model('unum-cloud/uform3-image-text-english-small')

model_text = models[Modality.TEXT_ENCODER]
model_image = models[Modality.IMAGE_ENCODER]
processor_text = processors[Modality.TEXT_ENCODER]
processor_image = processors[Modality.IMAGE_ENCODER]

Embed images:

import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image.open(BytesIO(requests.get(image_url).content))
image_data = processor_image(image)
image_features, image_embedding = model_image.encode(image_data, return_features=True)

Embed queries:

text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'
text_data = processor_text(text)
text_features, text_embedding = model_text.encode(text_data, return_features=True)

For more details check out:

Python docs on embedding models in python/README.md
JavaScript docs on embedding models in javascript/README.md
Swift docs on embedding models in swift/README.md

Generative Models

The generative models are natively compatible with

from transformers import AutoModel, AutoProcessor

model = AutoModel.from_pretrained('unum-cloud/uform-gen2-dpo', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('unum-cloud/uform-gen2-dpo', trust_remote_code=True)

prompt = 'Question or Instruction'
image = Image.open('image.jpg')

inputs = processor(text=[prompt], images=[image], return_tensors='pt')

with torch.inference_mode():
     output = model.generate(
        **inputs,
        do_sample=False,
        use_cache=True,
        max_new_tokens=256,
        eos_token_id=151645,
        pad_token_id=processor.tokenizer.pad_token_id
    )
prompt_len = inputs['input_ids'].shape[1]
decoded_text = processor.batch_decode(output[:, prompt_len:])[0]

For more details check out:

Python docs on generative models in python/README.md
JavaScript docs on generative models 🔜
Swift docs on generative models 🔜

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall. Switching from f32 to f16 is recommended in almost all cases, unless you are running on very old hardware without half-precision support. Switching to i8 with linear scaling is also possible, but will be noticeable in the recall on larger collections with millions of searchable entries. Similarly, for higher-dimensional embeddings (512 or 768), a common strategy is to quantize them into single-bit representations for faster search.

import numpy as np

f32_embedding: np.ndarray = model.encode_text(text_data, return_features=False)
f16_embedding: np.ndarray = f32_embedding.astype(np.float16)
i8_embedding: np.ndarray = (f32_embedding * 127).astype(np.int8)
b1_embedding: np.ndarray = np.packbits((f32_embedding > 0).astype(np.uint8))

Alternative approach to quantization is to use the Matryoshka embeddings, where the embeddings are sliced into smaller parts, and the search is performed in a hierarchical manner.

import numpy as np

large_embedding: np.ndarray = model.encode_text(text_data, return_features=False)
small_embedding: np.ndarray = large_embedding[:, :256]
tiny_embedding: np.ndarray = large_embedding[:, :64]

Both approaches are natively supported by the USearch vector-search engine and the SimSIMD numerics libraries. When dealing with small collections (up to millions of entries) and looking for low-latency cosine distance calculations, you can achieve 5x-2500x performance improvement over Torch, NumPy, SciPy, and vanilla Python using SimSIMD.

from simsimd import cosine, hamming

distance: float = cosine(f32_embedding, f32_embedding) # 32x SciPy performance on Apple M2 CPU
distance: float = cosine(f16_embedding, f16_embedding) # 79x SciPy performance on Apple M2 CPU
distance: float = cosine(i8_embedding, i8_embedding) # 133x SciPy performance on Apple M2 CPU
distance: float = hamming(b1_embedding, b1_embedding) # 17x SciPy performance on Apple M2 CPU

Similarly, when dealing with large collections (up to billions of entries per server) and looking for high-throughput search, you can achieve 100x performance improvement over FAISS and other vector-search solutions using USearch. Here are a couple of examples:

from usearch.index import Index

f32_index = Index(ndim=64, metric='cos', dtype='f32') # for Matryoshka embeddings
f16_index = Index(ndim=64, metric='cos', dtype='f16') # for Matryoshka embeddings
i8_index = Index(ndim=256, metric='cos', dtype='i8') # for quantized embeddings
b1_index = Index(ndim=768, metric='hamming', dtype='b1') # for binary embeddings

Compact Packaging

PyTorch is a heavy dependency to carry, especially if you run on Edge or IoT devices. Using vanilla ONNX runtime, one can significantly reduce memory consumption and deployment latency.

$ conda create -n uform_torch python=3.10 -y
$ conda create -n uform_onnx python=3.10 -y
$ conda activate uform_torch && pip install -e ".[torch]" && conda deactivate
$ conda activate uform_onnx && pip install -e ".[onnx]" && conda deactivate
$ du -sh $(conda info --envs | grep 'uform_torch' | awk '{print $2}')
> 5.2G    ~/conda/envs/uform_torch
$ du -sh $(conda info --envs | grep 'uform_onnx' | awk '{print $2}')
> 461M    ~/conda/envs/uform_onnx

Most of that weight can be further reduced down to 100 MB for both the model and the runtime. You can pick one of many supported ONNX execution providers, which includes XNNPACK, CUDA and TensorRT for Nvidia GPUs, OpenVINO on Intel, DirectML on Windows, ROCm on AMD, CoreML on Apple devices, and more to come.

Multimodal Chat in CLI

The generative models can be used for chat-like experiences in the command line. For that, you can use the uform-chat CLI tool, which is available in the UForm package.

$ pip install uform
$ uform-chat --model unum-cloud/uform-gen2-dpo --image=zebra.jpg
$ uform-chat --model unum-cloud/uform-gen2-dpo \
>     --image="https://bit.ly/3tIVg9M" \
>     --device="cuda:0" \
>     --fp16

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

unum

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

3.1.3

Sep 3, 2025

3.1.2

Jun 21, 2025

3.1.1

Jan 3, 2025

3.1.0

Dec 20, 2024

3.0.3

Oct 1, 2024

3.0.2

Apr 25, 2024

3.0.1

Apr 25, 2024

2.1.1

Apr 16, 2024

2.1.0

Apr 14, 2024

2.0.2

Mar 28, 2024

1.1.1

Feb 23, 2024

1.1.0

Feb 15, 2024

1.0.3

Dec 29, 2023

1.0.2

Dec 28, 2023

1.0.1

Dec 28, 2023

0.4.8

Oct 13, 2023

0.4.7

Oct 13, 2023

0.4.6

Oct 13, 2023

0.4.5

Oct 13, 2023

0.4.4

Sep 20, 2023

0.4.3

Sep 1, 2023

0.4.2

Aug 17, 2023

0.4.1

Aug 17, 2023

0.4.0

Aug 17, 2023

0.3.2

Aug 4, 2023

0.3.1

Aug 4, 2023

0.3.0

Aug 1, 2023

0.2.1

May 2, 2023

0.2.0

Mar 29, 2023

0.1.3

Mar 27, 2023

0.1.2

Mar 27, 2023

0.1.1

Mar 23, 2023

0.1.0

Mar 23, 2023

0.0.6

Mar 15, 2023

0.0.5

Feb 28, 2023

0.0.4

Feb 24, 2023

0.0.3

Feb 24, 2023

0.0.2

Feb 23, 2023

0.0.1

Feb 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uform-3.1.3.tar.gz (28.0 kB view details)

Uploaded Sep 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

uform-3.1.3-py3-none-any.whl (26.0 kB view details)

Uploaded Sep 3, 2025 Python 3

File details

Details for the file uform-3.1.3.tar.gz.

File metadata

Download URL: uform-3.1.3.tar.gz
Upload date: Sep 3, 2025
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for uform-3.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c9cafc0efdb7702a78fdca3c9d74392d8b67f1f4102f324ce01d59647c1c9cc1`
MD5	`2d8bbea7ef012e9313817a27295149ac`
BLAKE2b-256	`54e07c0d8cc16ad964fc91aa60cae29bc9289e1434d929b9a47d27c0c00c163a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uform-3.1.3.tar.gz:

Publisher: release.yml on unum-cloud/uform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uform-3.1.3.tar.gz
- Subject digest: c9cafc0efdb7702a78fdca3c9d74392d8b67f1f4102f324ce01d59647c1c9cc1
- Sigstore transparency entry: 463012472
- Sigstore integration time: Sep 3, 2025
Source repository:
- Permalink: unum-cloud/uform@98eac0c3bd52ca378442c342a5f92e5eb5367bea
- Branch / Tag: refs/heads/main
- Owner: https://github.com/unum-cloud
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@98eac0c3bd52ca378442c342a5f92e5eb5367bea
- Trigger Event: push

File details

Details for the file uform-3.1.3-py3-none-any.whl.

File metadata

Download URL: uform-3.1.3-py3-none-any.whl
Upload date: Sep 3, 2025
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for uform-3.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`00cc8f840f997b007959df36c24cf2ff123a3f5a7c36e4b63a3b50d4955d329d`
MD5	`b9473f4b0a603618f35cb8fb6152813c`
BLAKE2b-256	`5dac9efa1c3143c83a4b9b3d3726d309429dff20fb9074f4b5e9e45adc02d57d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uform-3.1.3-py3-none-any.whl:

Publisher: release.yml on unum-cloud/uform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uform-3.1.3-py3-none-any.whl
- Subject digest: 00cc8f840f997b007959df36c24cf2ff123a3f5a7c36e4b63a3b50d4955d329d
- Sigstore transparency entry: 463012490
- Sigstore integration time: Sep 3, 2025
Source repository:
- Permalink: unum-cloud/uform@98eac0c3bd52ca378442c342a5f92e5eb5367bea
- Branch / Tag: refs/heads/main
- Owner: https://github.com/unum-cloud
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@98eac0c3bd52ca378442c342a5f92e5eb5367bea
- Trigger Event: push

uform 3.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

UForm

Pocket-Sized Multimodal AI For Content Understanding and Generation

Features

Models

Embedding Models

Generative Models

Quick Start Examples

Embedding Models

Generative Models

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Compact Packaging

Multimodal Chat in CLI

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Pocket-Sized Multimodal AI
For Content Understanding and Generation