Skip to main content

Fast CPU implementation of MaxSim scoring

Project description

fast-maxsim

Note: This project is forked from mixedbread-ai/maxsim-cpu. The main improvement is adding support for macOS Intel (x86_64) pre-built wheels, in addition to the original Linux x86_64 and macOS ARM support.

fast-maxsim is a high-performance CPU implementation of MaxSim scoring for late-interaction (ColBERT, ColPali) workflows.

It is a python library written in Rust and powered by libsxmm on x86 CPUs and Apple Accelerate on ARM macs. It supports Linux x86_64, macOS ARM (Apple Silicon), and macOS Intel (x86_64) platforms.

fast-maxsim is built to run exclusively on CPU, and achieves speed-ups that scale with core count on the scoring machine. It's designed to be used in situations where index/scoring machines do not have access to GPUs, and achieves ~2-3x speed-ups on ARM macs and 5x speedups on Linux CPUs over common PyTorch maxsim implementations.

It also implements effective just-in-time batching and padding for variable documents, greatly reducing padding overhead and needless computations.

Getting Started

Pre-built wheels are available on Pypi for Python 3.9 through 3.13 and can be installed in the usual way:

uv pip install fast-maxsim # You may use vanilla pip install but why would you? If you're sophisticated, you could use `uv add` too!

Once installed, the simple API exposes two methods. For uniform-length inputs, you may use:

import numpy as np
import fast_maxsim

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: fast-maxsim expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

docs = np.random.randn(1000, 512, 128).astype(np.float32)  # [num_docs, doc_len, dim]
# Normalize document embeddings...

# Compute MaxSim scores
scores = fast_maxsim.maxsim_scores(query, docs)  # Returns [num_docs] scores

For variable length inputs, you should use the alternate maxsim_scores_variable:

import numpy as np
import fast_maxsim

# Prepare normalized embeddings
query = np.random.randn(32, 128).astype(np.float32)  # [num_query_tokens, dim]

# NOTE: fast-maxsim expects normalized vectors.
query /= np.linalg.norm(query, axis=1, keepdims=True)

# Create variable-length documents as a list
docs = [
    np.random.randn(np.random.randint(50, 800), 128).astype(np.float32)  # Variable length docs
    for _ in range(1000)
]
# Normalize document embeddings...

# Compute MaxSim scores
scores = fast_maxsim.maxsim_scores_variable(query, docs)  # Returns [num_docs] scores

Platform Requirements

  • macOS ARM: Apple Silicon (M1+)
  • macOS Intel: x86_64 with AVX2 (Intel Haswell 2013+ - Core i3/i5/i7 4xxx series or newer)
  • Linux: x86_64 with AVX2 (Intel Haswell 2013+, AMD Excavator 2015+)

We currently do not support Windows or take advantage of AVX512 instructions, nor do we optimise caching for specific CPUs. Contributions/PRs in this direction are welcome!

Note: Pre-built wheels on PyPI are available for Linux x86_64, macOS ARM (Apple Silicon), and macOS Intel (x86_64).

Building

We use maturin as our build system.

Linux

The easy way to build fast-maxsim from source on Linux is as follows:

# Install necessary system deps
apt-get install libssl-dev libopenblas-dev -y
apt-get install pkg-config -y
# Install tooling
uv pip install maturin patchelf numpy
# Install libxsmm
git@github.com:libxsmm/libxsmm.git && cd libxsmm && make STATIC=1 && make
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install fast-maxsim
git clone git@github.com:zhuwenxing/fast-maxsim.git
cd fast-maxsim
RUSTFLAGS="-L native=$(pwd)/../libxsmm/lib" maturin build --release --features use-libxsmm

Step by step:

  • This installs OpenSSL and OpenBLAS, which will be required for compiling, as well as pkg-config so they can be found easily.
  • It then clones libxsmm, on which most of the performance depends, and installs it.
  • Installs RUST and enables its environment
  • Clones this repository and finally build it

You may modify it and remove any step depending on dependencies already present on your machine.

Mac

On Mac, the installation is simplified, assuming you use homebrew:

For Apple Silicon (M1+):

# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install fast-maxsim
git clone git@github.com:zhuwenxing/fast-maxsim.git
cd fast-maxsim
maturin build --release

For Intel Mac (x86_64):

# Install maturin
uv pip install maturin
# Install patchelf
brew install patchelf
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"
# Clone and install fast-maxsim
git clone git@github.com:zhuwenxing/fast-maxsim.git
cd fast-maxsim
# Build with AVX2 support (requires Intel Haswell 2013+ or newer)
RUSTFLAGS="-C target-cpu=haswell" maturin build --release

Performance

For documents of uniform lengths, performance on Linux is slower than Jax on 4 core machines and either somewhat faster or slower depending on the CPU at 8 cores, and always faster than alternatives on ARM Macs. For variable document lengths (evaluated as a uniform distribution between 128 and 1536 tokens), fast-maxsim is always pretty fast thanks to more efficient batching.

Mac M4 Ultra

Mac M4 Ultra performance

Linux AMD EPYC

32 core limit performance

Linux AMD EPYC 32 core performance

16 core limit performance

It seems our performance was hindered during benchmarking due to a Rayon config issue when limiting the available cores. Leaving reporting as-is for now but performance is expected to be considerably better on an actual 16-core CPU.

Linux AMD EPYC 16 core performance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_maxsim-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

fast_maxsim-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (215.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fast_maxsim-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl (232.7 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

fast_maxsim-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

fast_maxsim-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (215.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_maxsim-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl (232.7 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

fast_maxsim-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

fast_maxsim-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (215.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fast_maxsim-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl (232.7 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

fast_maxsim-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

fast_maxsim-0.2.0-cp310-cp310-macosx_11_0_arm64.whl (215.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fast_maxsim-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl (232.7 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

fast_maxsim-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

fast_maxsim-0.2.0-cp39-cp39-macosx_11_0_arm64.whl (215.6 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

fast_maxsim-0.2.0-cp39-cp39-macosx_10_12_x86_64.whl (233.0 kB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

File details

Details for the file fast_maxsim-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 10c8de4a8a88bf013e46ae97527762f563a8668f577293e4ef4cb9efdc57d7a5
MD5 d78deb21c72e60b5796d6169fcb9f4bc
BLAKE2b-256 0a43b4d3d140c63e7324a5bfcbebd4cb9c84ef3ee5d82047ac9430042d14bceb

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 632e83df31e06f5603c6bd715bc0da0464ac9154afc2e0de179a05aefe156917
MD5 931373f90544d9d42c50075369de9392
BLAKE2b-256 bac9846484469f6c09a0f26751e0a75e29254e88d100e48617bee4dbd862b9ad

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4c9960bc8f94e498f94fde7236f9540e298009ddfbd076461bdf2b4c2d5ec7de
MD5 8dcfa0a80827632aaa0d4a15a809b6bf
BLAKE2b-256 7f060bf35d37fe83b5f77849d2c1c883a3357da30c3abceaf7f7aa32c7777aee

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8aed99d0ce471f06d7465eaa4d03a7e68591e7a1ecd4993cac728c8399d9ffe0
MD5 5b9f7676d14c4f47773baf0887b9e8a6
BLAKE2b-256 339781d846726067f0e25730d65f1607957012b1e223666ead059931bf719d43

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 348cde346a4e24e97d08d56ffc92272cb9cfe6541c7130e09f007e8b82a5a622
MD5 ee890214939c0080b40ea12f77aa3761
BLAKE2b-256 c065e4378a824e0b6b0ce7a9cd4a91003b532de480728d2593da3f98a465055e

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 51189e81649810e1f7b37ba8d0729de9475903165bd25d914c36fc10d4882096
MD5 e27df2f765bcc2a380c6827a16a588eb
BLAKE2b-256 660801acac8c952d069a0939d763f75b7913d79ffe76a345c068c920cacd4a42

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6c9901f44bcbdd995e6d8cc66df4e24122e6a2d35a0496fb33a1144b12cc44ad
MD5 9b111a11f318bd23ab052ecebdafdab8
BLAKE2b-256 7a0bc5539ab248fbf22b597e29d030253696518a7965c19117be6b537b75a538

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 963bc736c716123089ac045d0546540dfe29a094fcbf2c3b25ff5c6627cb267c
MD5 3f3f75766ba1e1b848cab781c1b7bed6
BLAKE2b-256 d5380617c584e4cd691a394b9efd1dc2eec52ad9757939e3d218d236927892e6

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 294a31cd08ee742dcc636683321ae9007845019bb302ceba02358c968dae2819
MD5 6cf9645d5d940c6b7ce7025140dc7f59
BLAKE2b-256 2780be9958588f09ffa639abd894dc0fca219192d06d4256281449ff0636fd90

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 75eeca7873272007fa6943af449cab0387adfc09b1b798540619be71b9804382
MD5 ec6f93e58b7a8c8a06912f198f0bd759
BLAKE2b-256 a23b08186b8dafa0a04e8c58cc3c9f510fcea01a4e708698803a4dc1f6fe076d

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93f011f08b817c6870bd3a3e99052e934dcd793b92479476533045774661e6ed
MD5 6ec296f8241f858ca60f3d2098e16b3e
BLAKE2b-256 a380d98785b70a1644cc9b8a18b1e628d1913f79d920f656fc34b5843136d884

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 917a609cfa947286e3f038c2cc857b5c0cf6fb173d7fcc717beddc2410cb7252
MD5 0874c0925f65953c775fc1527bf1cd07
BLAKE2b-256 17d88b260817d4cc3d2753c92da7681f96b4244bff02aaf8c49aabafbf7e9954

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2db1cf317c6da8dd14ae3305b7ec941e4ae045ab5f5c36523d4e355bf5f2a118
MD5 c57bdcb9ada49717be814dfa144ad5a2
BLAKE2b-256 13146f7d4aa0625e126ac004f0b33ecdf5d4949d257718364d1dcc0e8621b0ef

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7febf01ca073e0ee80d2c90b575d832c956cc3c15ce6507471324cc36fcfa006
MD5 f7d52309381ac171fe8d490311b322f5
BLAKE2b-256 0e3100cb4b51fa7350ee96ea738f656fbce52ce2ee2c853c7362ae30bc1919c3

See more details on using hashes here.

File details

Details for the file fast_maxsim-0.2.0-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_maxsim-0.2.0-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f9f538121b1c6692cada67009d693a3b62bd1925272396051f628f20de982457
MD5 bbfed5128b40c9517b2fea189a0033a3
BLAKE2b-256 89ad0c2f9eaccf7ba3b5c7a605816c532fa0a8ff797d625508f0e6dffab26d85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page