An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

These details have not been verified by PyPI

Project links

Project description

muvera-rs

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

Disclaimer

This is an unofficial implementation and is not affiliated with Google Research or the original authors.
Some parts of this README were generated by AI.

Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

Features

✅ Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
✅ Batch Processing: Efficient parallel encoding for large datasets
✅ Type Safety: Generic implementation supporting f32 and f64
✅ Memory Efficient: Uses ndarray for optimized vector operations
✅ Deterministic: Reproducible results with seed-based randomization
✅ Comprehensive Testing: Extensive test suite covering edge cases

Installation

Rust

Add to your Cargo.toml:

[dependencies]
muvera-rs = "0.1.0"

Python

Install from PyPI:

pip install muvera

Or install from source:

pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings

Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)

Or save the above as a script and run with python my_test.py.

Usage

Python Example

import numpy as np
import muvera

# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)

# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")

# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")

Rust Example

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Example token embeddings (num_tokens, embedding_dim)
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (sum aggregation)
    let query_fde = encoder.encode_query(tokens.view());
    
    // Encode document (average aggregation)
    let doc_fde = encoder.encode_doc(tokens.view());
}

Batch Processing

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}

Custom Aggregation

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}

API Reference

`FDEEncoder<T>`

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

Constructor

pub fn new(buckets: usize, dim: usize, seed: u64) -> Self

buckets: Number of hash buckets (hyperplanes)
dim: Dimensionality of input token embeddings
seed: Random seed for reproducible hyperplane initialization

Methods

encode(tokens, mode): Encode single multi-vector
batch_encode(tokens, mode): Encode batch of multi-vectors (parallel)
encode_query(tokens): Encode query with sum aggregation
encode_doc(tokens): Encode document with average aggregation
encode_query_batch(tokens): Batch encode queries
encode_doc_batch(tokens): Batch encode documents

`Aggregation`

Enum defining aggregation modes:

Sum: Sum all tokens in each bucket
Avg: Average all tokens in each bucket

Performance Considerations

Buckets: More buckets = better approximation but higher memory usage
Batch Size: Use batch encoding for multiple vectors to leverage parallelism
Memory: FDE vectors are buckets * dim in size
Precision: f32 is typically sufficient and faster than f64

References

Original Paper

MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
arXiv:2405.19504

Google Research Blog

MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search
Google Research Blog

Original Implementation

Google Graph Mining Repository
github.com/google/graph-mining/tree/main/sketching/point_cloud

Future Work

Planned Features

Benchmark Suite
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
Advanced Features
- Support BLAS
- Product Quantization (PQ): 32x compression with minimal quality loss
- Final Projections: Dimensionality reduction techniques

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jul 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muvera-0.2.0.tar.gz (62.5 kB view details)

Uploaded Jul 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

muvera-0.2.0-cp312-cp312-win_amd64.whl (182.7 kB view details)

Uploaded Jul 12, 2025 CPython 3.12Windows x86-64

File details

Details for the file muvera-0.2.0.tar.gz.

File metadata

Download URL: muvera-0.2.0.tar.gz
Upload date: Jul 12, 2025
Size: 62.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.1

File hashes

Hashes for muvera-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`61390f9b2e32ffb7f8022a2efc7acaef404fb2556883d14a3c4f5b527c59a477`
MD5	`a39a75e77458c553c82505cc8b67bbcb`
BLAKE2b-256	`be578624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c`

See more details on using hashes here.

File details

Details for the file muvera-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

Download URL: muvera-0.2.0-cp312-cp312-win_amd64.whl
Upload date: Jul 12, 2025
Size: 182.7 kB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.1

File hashes

Hashes for muvera-0.2.0-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`ff216042f6253473f44d8e4405657c60f030ea4d8238ca2afd50876d7876f31a`
MD5	`4c7e0aa1e03005279611ee7ab77b096b`
BLAKE2b-256	`736aef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130`

See more details on using hashes here.

muvera 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

muvera-rs

Disclaimer

Overview

The Algorithm

Features

Installation

Rust

Python

Testing the Python Bindings

Usage

Python Example

Rust Example

Batch Processing

Custom Aggregation

API Reference

FDEEncoder<T>

Constructor

Methods

Aggregation

Performance Considerations

References

Original Paper

Google Research Blog

Original Implementation

Future Work

Planned Features

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`FDEEncoder<T>`

`Aggregation`