Skip to main content

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Project description

muvera-rs

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

Crates.io Documentation License

Disclaimer

  • This is an unofficial implementation and is not affiliated with Google Research or the original authors.
  • Some parts of this README were generated by AI.

Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

  1. Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
  2. Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
  3. Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
  4. Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

Features

  • Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
  • Batch Processing: Efficient parallel encoding for large datasets
  • Type Safety: Generic implementation supporting f32 and f64
  • Memory Efficient: Uses ndarray for optimized vector operations
  • Deterministic: Reproducible results with seed-based randomization
  • Comprehensive Testing: Extensive test suite covering edge cases

Installation

Rust

Add to your Cargo.toml:

[dependencies]
muvera-rs = "0.1.0"

Python

Install from PyPI:

pip install muvera

Or install from source:

pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings

Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)

Or save the above as a script and run with python my_test.py.

Usage

Python Example

import numpy as np
import muvera

# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)

# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")

# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")

Rust Example

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Example token embeddings (num_tokens, embedding_dim)
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (sum aggregation)
    let query_fde = encoder.encode_query(tokens.view());
    
    // Encode document (average aggregation)
    let doc_fde = encoder.encode_doc(tokens.view());
}

Batch Processing

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}

Custom Aggregation

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}

API Reference

FDEEncoder<T>

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

Constructor

pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
  • buckets: Number of hash buckets (hyperplanes)
  • dim: Dimensionality of input token embeddings
  • seed: Random seed for reproducible hyperplane initialization

Methods

  • encode(tokens, mode): Encode single multi-vector
  • batch_encode(tokens, mode): Encode batch of multi-vectors (parallel)
  • encode_query(tokens): Encode query with sum aggregation
  • encode_doc(tokens): Encode document with average aggregation
  • encode_query_batch(tokens): Batch encode queries
  • encode_doc_batch(tokens): Batch encode documents

Aggregation

Enum defining aggregation modes:

  • Sum: Sum all tokens in each bucket
  • Avg: Average all tokens in each bucket

Performance Considerations

  • Buckets: More buckets = better approximation but higher memory usage
  • Batch Size: Use batch encoding for multiple vectors to leverage parallelism
  • Memory: FDE vectors are buckets * dim in size
  • Precision: f32 is typically sufficient and faster than f64

References

Original Paper

  • MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
    Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
    arXiv:2405.19504

Google Research Blog

Original Implementation

Future Work

Planned Features

  1. Benchmark Suite

    • Integration with BEIR datasets (MS MARCO, etc.)
    • Memory usage and compression analysis
  2. Advanced Features

    • Support BLAS
    • Product Quantization (PQ): 32x compression with minimal quality loss
    • Final Projections: Dimensionality reduction techniques

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muvera-0.2.0.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

muvera-0.2.0-cp312-cp312-win_amd64.whl (182.7 kB view details)

Uploaded CPython 3.12Windows x86-64

File details

Details for the file muvera-0.2.0.tar.gz.

File metadata

  • Download URL: muvera-0.2.0.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for muvera-0.2.0.tar.gz
Algorithm Hash digest
SHA256 61390f9b2e32ffb7f8022a2efc7acaef404fb2556883d14a3c4f5b527c59a477
MD5 a39a75e77458c553c82505cc8b67bbcb
BLAKE2b-256 be578624c02b45978e7dce6cbe91f664284718055ce67e5b2d56c6ea3c81045c

See more details on using hashes here.

File details

Details for the file muvera-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: muvera-0.2.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 182.7 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for muvera-0.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ff216042f6253473f44d8e4405657c60f030ea4d8238ca2afd50876d7876f31a
MD5 4c7e0aa1e03005279611ee7ab77b096b
BLAKE2b-256 736aef5c0d64c3eb2a369042a7f2dc8617e71cc9c1558746b9fc3c50799f6130

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page