Production-grade multimodal embedding model with ONNX export and int8 quantization
Project description
OmniVector-Embed
Production-grade multimodal embedding model — unified 4096-dimensional embeddings for text, code, image, video, and audio. Built on Mistral-7B with ONNX export and int8 quantization for deployment.
Replicates and extends NV-Embed-v2 with multimodal support and CPU-friendly inference.
Installation
pip install omnivector-embed
With vision support:
pip install omnivector-embed[vision]
Quick Start
from omnivector.model import OmniVectorModel
# Load a trained model
model = OmniVectorModel.from_pretrained("AuralithAI/omnivector-embed-v1")
# Encode text
embeddings = model.encode(["What is machine learning?", "ML is a subset of AI."])
# Encode with Matryoshka dimensionality
embeddings_512 = model.encode(["query"], output_dim=512)
embeddings_4096 = model.encode(["query"], output_dim=4096)
Key Features
| Feature | Description |
|---|---|
| Multimodal | Text, code, image, video, and audio in one embedding space |
| Matryoshka | Flexible output dimensions: 512, 1024, 2048, 4096 |
| ONNX Export | Opset 17 with dynamic int8 quantization |
| CPU Inference | Full ONNX Runtime support — no GPU required |
| LoRA Training | Fine-tune with 0.1% of parameters (rank 16) |
| 3-Stage Pipeline | Retrieval → Generalist → Multimodal training |
Architecture
Input → Mistral-7B (bidirectional, eager attention, LoRA)
→ Latent Attention Pooling (512 latents × 4096, 8 heads)
→ Matryoshka dimensions [512, 1024, 2048, 4096]
→ L2 normalize
- Backbone: Mistral-7B-v0.1 with bidirectional attention
- Pooling: Cross-attention with learned latent queries
- Vision: SigLIP-SO400M (1152 → 4096 projection)
- Audio: Whisper-tiny (384 → 4096 MLP projection)
- Loss: InfoNCE + Matryoshka Representation Learning + cross-modal contrastive
ONNX Export
from omnivector.export import OnnxExporter
exporter = OnnxExporter(model_path="path/to/model", opset_version=17)
exporter.export("model.onnx")
# Quantize to int8
from omnivector.export import OnnxQuantizer
OnnxQuantizer.quantize_dynamic("model.onnx", "model_int8.onnx")
Evaluation
Built-in MTEB evaluation:
python scripts/evaluate.py --model-path path/to/model --tasks retrieval
Training
3-stage training pipeline with DeepSpeed ZeRO-2:
# Stage 1: Retrieval (text pairs with hard negatives)
python scripts/training.py --config configs/stage1_retrieval.yaml
# Stage 2: Generalist (55M+ pairs)
python scripts/training.py --config configs/stage2_generalist.yaml
# Stage 3: Multimodal (image/video/audio + text)
python scripts/train_multimodal.py --config configs/stage3_multimodal.yaml
Stack
| Component | Version |
|---|---|
| Python | ≥ 3.9 |
| PyTorch | ≥ 2.2.0 |
| Transformers | 4.44.2 |
| PEFT | 0.12.0 |
| ONNX Runtime | ≥ 1.18.0 |
| DeepSpeed | ≥ 0.14.0 |
Links
- GitHub: AuralithAI/OmniVector-Embed
- Paper: NV-Embed-v2 (arXiv:2405.17428)
- Documentation: docs/architecture.md
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnivector_embed-0.1.0.tar.gz.
File metadata
- Download URL: omnivector_embed-0.1.0.tar.gz
- Upload date:
- Size: 661.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02769b456c6236f8ae4f15d201748e2f59e53f965db66d94d1ee12d4d8fd99cc
|
|
| MD5 |
0ed5f2de4ba2fa0462f454cdd70dd6b9
|
|
| BLAKE2b-256 |
3883818f15066f96ae445ffcebdd2c957c1ed25bc407433eaef0b56a93b59a8f
|
Provenance
The following attestation bundles were made for omnivector_embed-0.1.0.tar.gz:
Publisher:
release.yml on AuralithAI/OmniVector-Embed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnivector_embed-0.1.0.tar.gz -
Subject digest:
02769b456c6236f8ae4f15d201748e2f59e53f965db66d94d1ee12d4d8fd99cc - Sigstore transparency entry: 1150016626
- Sigstore integration time:
-
Permalink:
AuralithAI/OmniVector-Embed@5621935eb3c1f43da123c1fd7e8df765a90f046f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AuralithAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5621935eb3c1f43da123c1fd7e8df765a90f046f -
Trigger Event:
push
-
Statement type:
File details
Details for the file omnivector_embed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omnivector_embed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caffba33a36a04176a6206849cf2f345d64456c5f14507aabdce5dce82862a57
|
|
| MD5 |
4cb3c91f09fe16951a146c226f0c17bb
|
|
| BLAKE2b-256 |
e964e2fbaf204ee4cb349df63fd592e4b31860d4902aecc5b426ddb2100a6ee6
|
Provenance
The following attestation bundles were made for omnivector_embed-0.1.0-py3-none-any.whl:
Publisher:
release.yml on AuralithAI/OmniVector-Embed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnivector_embed-0.1.0-py3-none-any.whl -
Subject digest:
caffba33a36a04176a6206849cf2f345d64456c5f14507aabdce5dce82862a57 - Sigstore transparency entry: 1150016661
- Sigstore integration time:
-
Permalink:
AuralithAI/OmniVector-Embed@5621935eb3c1f43da123c1fd7e8df765a90f046f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AuralithAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5621935eb3c1f43da123c1fd7e8df765a90f046f -
Trigger Event:
push
-
Statement type: