Fast CPU implementation of MaxSim scoring
Project description
maxsim-cpu
maxsim-cpu is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.
It is a python library written in Rust and powered by libsxmm on x86 CPUs and Apple Accelerate on ARM macs. It only supports Linux x86 machines and ARM Macs at the moment.
maxsim-cpu is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.
It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.
Getting Started
Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:
uv pip install maxsim-cpu # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!
Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:
import numpy as np
import maxsim_cpu
# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]
# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)
docs = np.random.randn(1000, 512, 128).astype(np.float32) # [num_docs, doc_len, dim]
# Normalize document embeddings...
# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores(query, docs) # Returns [num_docs] scores
For variable length inputs, you should use the alternate maxsim_scores_variable:
import numpy as np
import maxsim_cpu
# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32) # [num_query_tokens, dim]
# NOTE: maxsim-cpu expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)
# Create variable-length documents as a list
docs = [
np.random.randn(np.random.randint(50, 800), 128).astype(np.float32) # Variable length docs
for _ in range(1000)
]
# Normalize document embeddings...
# Compute MaxSim scores
scores = maxsim_cpu.maxsim_scores_variable(query, docs) # Returns [num_docs] scores
Platform Requirements
- macOS: Apple Silicon (M1+)
- Linux: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)
We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!
Building
We use maturin as our build system.
Linux
The easy way to build maxsim-cpu from source on Linux is as follows:
# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm
Step by step:
- This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
- It then clones
libxsmm, on which most of the performance depends, and installs it. - Installs RUST and enables its environment
- Clones this repository and finally build it
You may modify it and remove any step depending on dependencies already present on your machine.
Mac
On Mac, the installation is simplified, assuming you use homebrew:
# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install maxsim-cpu
git clone git@github.com:mixedbread-ai/maxsim-cpu.git
cd maxsim-cpu
maturin build --release -q
Performance
For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), maxsim-cpu is always pretty fast thanks to more efficient batching.
Mac M4 Ultra
Linux AMD EPYC
32 core limit performance
16 core limit performance
It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dcb010b1fa62d89807a13c995e7aebe8bf0307da155b4ad7b622a26e9dd3498
|
|
| MD5 |
3d73c951077911896d1360aea694c335
|
|
| BLAKE2b-256 |
70dfe1720b9684bca7afabd461150ab2429f2685689dae9ab923830ad280569f
|
File details
Details for the file maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 216.9 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db99d8a23da0e59bdfb765ff9a33af394f9f80144f5e6d0a999adea8f16e10a2
|
|
| MD5 |
eb9067e1fb37fcb6f5f12f701092de90
|
|
| BLAKE2b-256 |
9c05e281f900690f9035168697a30a64dce73b65d219548db86e20beec5163db
|
File details
Details for the file maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28b1471a2a80204b35ea5852c0971f59880555e591364553e58ff4dc386e1143
|
|
| MD5 |
d5b4a767526edf855ba873119d30f994
|
|
| BLAKE2b-256 |
32d25dfa3b69e0394c4f68139967764181b7cf242c4c3e72d4b5a29bcdf59866
|
File details
Details for the file maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 216.9 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f6860e0f814aa207db202076909409ad2eb73a8de9bf2032ae3335466d2a478
|
|
| MD5 |
e143815f7bb3ba7d39fd74fd96659ba0
|
|
| BLAKE2b-256 |
056e0bcbb7f3e8ef348a94105c9405004d5e8a28fc5d63542ce2096c9c7b2dbd
|
File details
Details for the file maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
374eedf3e2682ce3fd8a3642b43e75d3f1d9622a3c1ddd978565001d913e8562
|
|
| MD5 |
68e352095f2d35767ef2d45dd59782fb
|
|
| BLAKE2b-256 |
0446151b0d1b671a76baffb260e3be9da2867f1a7e5c2136df99834f2783b256
|
File details
Details for the file maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 216.9 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2e35bfc618aaa36e232aba94580d9a3e94f4bf2e4a3c8f98d1d96aaabc31741
|
|
| MD5 |
0884002e993bbd7803d5819a9ab138d2
|
|
| BLAKE2b-256 |
9d01afe521f19a731b636139bcd3a1086a1583c3ffac364f2a79268156a8297f
|
File details
Details for the file maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ff45504832f4982ae78a5433d62a38ae73059a23275dadddb766fb8662c87da
|
|
| MD5 |
7f976211f4aa3876c2f11575cc10e390
|
|
| BLAKE2b-256 |
ed297225a3dee685a5d59c15ce94556a9597e1ee8157979550087918d6df1895
|
File details
Details for the file maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 216.9 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebbdc8e7a7093dcdea4ff0f5f15f6aec430356db989f9ffd5f3d7eedac162dc2
|
|
| MD5 |
8f37ec3efd4477255b9e2e6dd41c718c
|
|
| BLAKE2b-256 |
aa6ac21e917ec4464a4d5bebc7518461ed19dddd857a62e86a306b6df2fafa8d
|
File details
Details for the file maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34807dc3c631f90139c49dccdb3baf4ffa62f064688bcf8eb961d77580785460
|
|
| MD5 |
b7f3e8f2b958d23ffa380bfc2633b2a8
|
|
| BLAKE2b-256 |
1fb3afe378aa2f87fa5ab43a143757bcd7cf3cf55e1b1bc89b6756dff38d55df
|
File details
Details for the file maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: maxsim_cpu-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 217.1 kB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd638a191518e0e32bb08793476184aa96c54304a6a344f5ee2182d30e2dd38c
|
|
| MD5 |
2c93f6e89ece714a43a2aff57456e424
|
|
| BLAKE2b-256 |
464f31acdbacbe4420b12564df2d928a190fa88798289288576a663f9c92dd1e
|