Skip to main content

Fast CPU implementation of MaxSim scoring

Project description

maxsim-cpu

maxsim-cpu is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.

It is a python library written in Rust and powered by libsxmm on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.

maxsim-cpu is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.

It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.

Getting Started

Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:

uv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!

Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:

import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

docs = np.random.randn(1000, 512, 128).astype(np.float32)  # [num_docs, doc_len, dim]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores(query, docs)  # Returns [num_docs] scores

For variable length inputs, you should use the alternate maxsim_scores_variable:

import numpy as np
import maxsim_cpu

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

# Create variable-length documents as a list
docs = [
    np.random.randn(np.random.randint(50, 800), 128).astype(np.float32)  # Variable length docs
    for _ in range(1000)
]
# Normalize document embeddings...

# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores_variable(query, docs)  # Returns [num_docs] scores

Platform Requirements

  • macOS: Apple Silicon (M1+)
  • Linux: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)

We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!

Building

We use maturin as our build system.

Linux

The easy way to build maxsim-cpu from source on Linux is as follows:

# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm

Step by step:

  • This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
  • It then clones libxsmm, on which most of the performance depends, and installs it.
  • Installs RUST and enables its environment
  • Clones this repository and finally build it

You may modify it and remove any step depending on dependencies already present on your machine.

Mac

On Mac, the installation is simplified, assuming you use homebrew:

# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
maturin build --release -q

Performance

For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), maxsim-cpu is always pretty fast thanks to more efficient batching.

Mac M4 Ultra

Mac M4 Ultra performance

Linux AMD EPYC

32 core limit performance

Linux AMD EPYC 32 core performance

16 core limit performance

It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.

Linux AMD EPYC 16 core performance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (216.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (216.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (216.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (216.9 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (217.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1dcb010b1fa62d89807a13c995e7aebe8bf0307da155b4ad7b622a26e9dd3498
MD5 3d73c951077911896d1360aea694c335
BLAKE2b-256 70dfe1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 db99d8a23da0e59bdfb765ff9a33af394f9f80144f5e6d0a999adea8f16e10a2
MD5 eb9067e1fb37fcb6f5f12f701092de90
BLAKE2b-256 9c05e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 28b1471a2a80204b35ea5852c0971f59880555e591364553e58ff4dc386e1143
MD5 d5b4a767526edf855ba873119d30f994
BLAKE2b-256 32d25dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f6860e0f814aa207db202076909409ad2eb73a8de9bf2032ae3335466d2a478
MD5 e143815f7bb3ba7d39fd74fd96659ba0
BLAKE2b-256 056e0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 374eedf3e2682ce3fd8a3642b43e75d3f1d9622a3c1ddd978565001d913e8562
MD5 68e352095f2d35767ef2d45dd59782fb
BLAKE2b-256 0446151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c2e35bfc618aaa36e232aba94580d9a3e94f4bf2e4a3c8f98d1d96aaabc31741
MD5 0884002e993bbd7803d5819a9ab138d2
BLAKE2b-256 9d01afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9ff45504832f4982ae78a5433d62a38ae73059a23275dadddb766fb8662c87da
MD5 7f976211f4aa3876c2f11575cc10e390
BLAKE2b-256 ed297225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ebbdc8e7a7093dcdea4ff0f5f15f6aec430356db989f9ffd5f3d7eedac162dc2
MD5 8f37ec3efd4477255b9e2e6dd41c718c
BLAKE2b-256 aa6ac21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 34807dc3c631f90139c49dccdb3baf4ffa62f064688bcf8eb961d77580785460
MD5 b7f3e8f2b958d23ffa380bfc2633b2a8
BLAKE2b-256 1fb3afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df

See more details on using hashes here.

File details

Details for the file maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bd638a191518e0e32bb08793476184aa96c54304a6a344f5ee2182d30e2dd38c
MD5 2c93f6e89ece714a43a2aff57456e424
BLAKE2b-256 464f31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page