Skip to main content

Production-grade multimodal embedding model with ONNX export and int8 quantization

Project description

OmniVector-Embed

PyPI Python 3.9+ License Apache 2.0

Production-grade multimodal embedding model — unified 4096-dimensional embeddings for text, code, image, video, and audio. Built on Mistral-7B with ONNX export and int8 quantization for deployment.

Replicates and extends NV-Embed-v2 with multimodal support and CPU-friendly inference.

Installation

pip install omnivector-embed

With vision support:

pip install omnivector-embed[vision]

Quick Start

from omnivector.model import OmniVectorModel

# Load a trained model
model = OmniVectorModel.from_pretrained("AuralithAI/omnivector-embed-v1")

# Encode text
embeddings = model.encode(["What is machine learning?", "ML is a subset of AI."])

# Encode with Matryoshka dimensionality
embeddings_512 = model.encode(["query"], output_dim=512)
embeddings_4096 = model.encode(["query"], output_dim=4096)

Key Features

Feature Description
Multimodal Text, code, image, video, and audio in one embedding space
Matryoshka Flexible output dimensions: 512, 1024, 2048, 4096
ONNX Export Opset 17 with dynamic int8 quantization
CPU Inference Full ONNX Runtime support — no GPU required
LoRA Training Fine-tune with 0.1% of parameters (rank 16)
3-Stage Pipeline Retrieval → Generalist → Multimodal training

Architecture

Input → Mistral-7B (bidirectional, eager attention, LoRA)
      → Latent Attention Pooling (512 latents × 4096, 8 heads)
      → Matryoshka dimensions [512, 1024, 2048, 4096]
      → L2 normalize
  • Backbone: Mistral-7B-v0.1 with bidirectional attention
  • Pooling: Cross-attention with learned latent queries
  • Vision: SigLIP-SO400M (1152 → 4096 projection)
  • Audio: Whisper-tiny (384 → 4096 MLP projection)
  • Loss: InfoNCE + Matryoshka Representation Learning + cross-modal contrastive

ONNX Export

from omnivector.export import OnnxExporter

exporter = OnnxExporter(model_path="path/to/model", opset_version=17)
exporter.export("model.onnx")

# Quantize to int8
from omnivector.export import OnnxQuantizer
OnnxQuantizer.quantize_dynamic("model.onnx", "model_int8.onnx")

Evaluation

Built-in MTEB evaluation:

python scripts/evaluate.py --model-path path/to/model --tasks retrieval

Training

3-stage training pipeline with DeepSpeed ZeRO-2:

# Stage 1: Retrieval (text pairs with hard negatives)
python scripts/training.py --config configs/stage1_retrieval.yaml

# Stage 2: Generalist (55M+ pairs)
python scripts/training.py --config configs/stage2_generalist.yaml

# Stage 3: Multimodal (image/video/audio + text)
python scripts/train_multimodal.py --config configs/stage3_multimodal.yaml

Stack

Component Version
Python ≥ 3.9
PyTorch ≥ 2.2.0
Transformers 4.44.2
PEFT 0.12.0
ONNX Runtime ≥ 1.18.0
DeepSpeed ≥ 0.14.0

Links

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnivector_embed-0.1.0.tar.gz (661.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnivector_embed-0.1.0-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file omnivector_embed-0.1.0.tar.gz.

File metadata

  • Download URL: omnivector_embed-0.1.0.tar.gz
  • Upload date:
  • Size: 661.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnivector_embed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 02769b456c6236f8ae4f15d201748e2f59e53f965db66d94d1ee12d4d8fd99cc
MD5 0ed5f2de4ba2fa0462f454cdd70dd6b9
BLAKE2b-256 3883818f15066f96ae445ffcebdd2c957c1ed25bc407433eaef0b56a93b59a8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnivector_embed-0.1.0.tar.gz:

Publisher: release.yml on AuralithAI/OmniVector-Embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omnivector_embed-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for omnivector_embed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 caffba33a36a04176a6206849cf2f345d64456c5f14507aabdce5dce82862a57
MD5 4cb3c91f09fe16951a146c226f0c17bb
BLAKE2b-256 e964e2fbaf204ee4cb349df63fd592e4b31860d4902aecc5b426ddb2100a6ee6

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnivector_embed-0.1.0-py3-none-any.whl:

Publisher: release.yml on AuralithAI/OmniVector-Embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page