An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Project description
muvera-rs
An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.
Disclaimer
- This is an unofficial implementation and is not affiliated with Google Research or the original authors.
- Some parts of this README were generated by AI.
Overview
MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.
The Algorithm
Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.
MuVERA solves this by:
- Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
- Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
- Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
- Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity
The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.
Features
- ✅ Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
- ✅ Batch Processing: Efficient parallel encoding for large datasets
- ✅ Type Safety: Generic implementation supporting
f32andf64 - ✅ Memory Efficient: Uses
ndarrayfor optimized vector operations - ✅ Deterministic: Reproducible results with seed-based randomization
- ✅ Comprehensive Testing: Extensive test suite covering edge cases
Installation
Rust
Add to your Cargo.toml:
[dependencies]
muvera-rs = "0.1.0"
Python
Install from PyPI:
pip install muvera
Or install from source:
pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings
Testing the Python Bindings
After building with maturin, you can test the Python bindings interactively:
import numpy as np
import muvera
embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)
Or save the above as a script and run with python my_test.py.
Usage
Python Example
import numpy as np
import muvera
# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)
# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")
# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")
Rust Example
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};
fn main() {
// Create encoder with 1024 buckets for 768-dimensional embeddings
let encoder = FDEEncoder::new(128, 768, 42);
// Example token embeddings (num_tokens, embedding_dim)
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Encode query (sum aggregation)
let query_fde = encoder.encode_query(tokens.view());
// Encode document (average aggregation)
let doc_fde = encoder.encode_doc(tokens.view());
}
Batch Processing
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};
fn encode_batch() {
let encoder = FDEEncoder::new(128, 768, 42);
// Batch of token embeddings (batch_size, num_tokens, embedding_dim)
let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
// Encode entire batch
let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
println!("Encoded batch shape: {:?}", batch_fdes.shape());
}
Custom Aggregation
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
fn custom_encoding() {
let encoder = FDEEncoder::new(128, 768, 42);
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Use custom aggregation mode
let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}
API Reference
FDEEncoder<T>
The main encoder struct that implements the Fixed Dimensional Encoding algorithm.
Constructor
pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
buckets: Number of hash buckets (hyperplanes)dim: Dimensionality of input token embeddingsseed: Random seed for reproducible hyperplane initialization
Methods
encode(tokens, mode): Encode single multi-vectorbatch_encode(tokens, mode): Encode batch of multi-vectors (parallel)encode_query(tokens): Encode query with sum aggregationencode_doc(tokens): Encode document with average aggregationencode_query_batch(tokens): Batch encode queriesencode_doc_batch(tokens): Batch encode documents
Aggregation
Enum defining aggregation modes:
Sum: Sum all tokens in each bucketAvg: Average all tokens in each bucket
Performance Considerations
- Buckets: More buckets = better approximation but higher memory usage
- Batch Size: Use batch encoding for multiple vectors to leverage parallelism
- Memory: FDE vectors are
buckets * dimin size - Precision:
f32is typically sufficient and faster thanf64
References
Original Paper
- MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
arXiv:2405.19504
Google Research Blog
- MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search
Google Research Blog
Original Implementation
- Google Graph Mining Repository
github.com/google/graph-mining/tree/main/sketching/point_cloud
Future Work
Planned Features
-
Benchmark Suite
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
-
Advanced Features
- Support BLAS
- Product Quantization (PQ): 32x compression with minimal quality loss
- Final Projections: Dimensionality reduction techniques
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file muvera-0.2.0.tar.gz.
File metadata
- Download URL: muvera-0.2.0.tar.gz
- Upload date:
- Size: 62.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61390f9b2e32ffb7f8022a2efc7acaef404fb2556883d14a3c4f5b527c59a477
|
|
| MD5 |
a39a75e77458c553c82505cc8b67bbcb
|
|
| BLAKE2b-256 |
be578624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c
|
File details
Details for the file muvera-0.2.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: muvera-0.2.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 182.7 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff216042f6253473f44d8e4405657c60f030ea4d8238ca2afd50876d7876f31a
|
|
| MD5 |
4c7e0aa1e03005279611ee7ab77b096b
|
|
| BLAKE2b-256 |
736aef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130
|