Skip to main content

Fast CPU implementation of MaxSim scoring

Project description

maxsim-cpu

maxsim-cpu is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.

It is a python library written in Rust and powered by libsxmm on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.

maxsim-cpu is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.

It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.

Getting Started

Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:

uv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!

Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:

import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

docs = np.random.randn(1000, 512, 128).astype(np.float32)  # [num_docs, doc_len, dim]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores(query, docs)  # Returns [num_docs] scores

For variable length inputs, you should use the alternate maxsim_scores_variable:

import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

# Create variable-length documents as a list
docs = [
    np.random.randn(np.random.randint(50, 800), 128).astype(np.float32)  # Variable length docs
    for _ in range(1000)
]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores_variable(query, docs)  # Returns [num_docs] scores

Platform Requirements

  • macOS ARM: Apple Silicon (M1+)
  • macOS Intel: x86_64 with AVX2 (Intel Haswell 2013+ - Core i3/i5/i7 4xxx series or newer)
  • Linux: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)

We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!

Note: Pre-built wheels on PyPI are currently only available for Linux x86_64 and macOS ARM (Apple Silicon). For Intel Mac users, you'll need to build from source (see below).

Building

We use maturin as our build system.

Linux

The easy way to build maxsim-cpu from source on Linux is as follows:

# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm

Step by step:

  • This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
  • It then clones libxsmm, on which most of the performance depends, and installs it.
  • Installs RUST and enables its environment
  • Clones this repository and finally build it

You may modify it and remove any step depending on dependencies already present on your machine.

Mac

On Mac, the installation is simplified, assuming you use homebrew:

For Apple Silicon (M1+):

# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
maturin build --release

For Intel Mac (x86_64):

# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
# Build with AVX2 support (requires Intel Haswell 2013+ or newer)
RUSTFLAGS="-C target-cpu=haswell" maturin build --release

Performance

For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), maxsim-cpu is always pretty fast thanks to more efficient batching.

Mac M4 Ultra

Mac M4 Ultra performance

Linux AMD EPYC

32 core limit performance

Linux AMD EPYC 32 core performance

16 core limit performance

It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.

Linux AMD EPYC 16 core performance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maxsim_fast-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (215.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file maxsim_fast-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_fast-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6f795213e14665219311b8ae3b7d99d9560c5497eec52575dc7c0a0c747363a8
MD5 ef87e3d0671babdb424879cf6aa3f5ee
BLAKE2b-256 3b4b1f40a5b1dcc88bb7032dc79eb8028bce566980da60a1fed8b23fa4b249f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page