Skip to main content

SIMD-accelerated similarity measures for x86 and Arm

Project description

SimSIMD 📏

Efficient Alternative to scipy.spatial.distance and numpy.inner

SimSIMD leverages SIMD intrinsics, capabilities that only select compilers effectively utilize. This framework supports conventional AVX2 instructions on x86, NEON on Arm, as well as rare AVX-512 FP16 instructions on x86 and Scalable Vector Extensions on Arm. Designed specifically for Machine Learning contexts, it's optimized for handling high-dimensional vector embeddings.

  • 3-200x faster than NumPy and SciPy distance functions.
  • ✅ Euclidean (L2), Inner Product, and Cosine (Angular) distances.
  • ✅ Single-precision f32, half-precision f16, and i8 vectors.
  • ✅ Compatible with NumPy, PyTorch, TensorFlow, and other tensors.
  • ✅ Has no dependencies, not even LibC.

Benchmarks

Apple M2 Pro

Given 10,000 embeddings from OpenAI Ada API with 1536 dimensions, running on the Apple M2 Pro Arm CPU with NEON support, here's how SimSIMD performs against conventional methods:

Conventional SimSIMD f32 improvement f16 improvement i8 improvement
scipy.spatial.distance.cosine cosine 39 x 84 x 196 x
scipy.spatial.distance.sqeuclidean sqeuclidean 8 x 25 x 22 x
numpy.inner inner 3 x 10 x 18 x

Intel Sapphire Rapids

On the Intel Sapphire Rapids platform, SimSIMD was benchmarked against autovectorized-code using GCC 12. GCC handles single-precision float and int8_t well. However, it fails on _Float16 arrays, which has been part of the C language since 2011.

GCC 12 f32 GCC 12 f16 SimSIMD f16 f16 improvement
cosine 3.28 M/s 336.29 k/s 6.88 M/s 20 x
sqeuclidean 4.62 M/s 147.25 k/s 5.32 M/s 36 x
inner 3.81 M/s 192.02 k/s 5.99 M/s 31 x

Technical Insights:

  • Uses Arm SVE and x86 AVX-512's masked loads to eliminate tail for-loops.
  • Substitutes LibC's sqrt calls with bithacks using Jan Kadlec's constant.
  • Avoids slow PyBind11 and SWIG, directly using the CPython C API.
  • Avoids slow PyArg_ParseTuple and manually unpacks argument tuples.

Using in Python

Installation

pip install simsimd

Distance Between 2 Vectors

import simsimd
import numpy as np

vec1 = np.random.randn(1536).astype(np.float32)
vec2 = np.random.randn(1536).astype(np.float32)
dist = simsimd.cosine(vec1, vec2)

Distance Between 2 Batches

batch1 = np.random.randn(100, 1536).astype(np.float32)
batch2 = np.random.randn(100, 1536).astype(np.float32)
dist = simsimd.cosine(batch1, batch2)

All Pairwise Distances

For calculating distances between all possible pairs of rows across two matrices (akin to scipy.spatial.distance.cdist):

matrix1 = np.random.randn(1000, 1536).astype(np.float32)
matrix2 = np.random.randn(10, 1536).astype(np.float32)
distances = simsimd.cdist(matrix1, matrix2, metric="cosine")

Multithreading

By default, computations use a single CPU core. To optimize and utilize all CPU cores on Linux systems, add the threads=0 argument. Alternatively, specify a custom number of threads:

distances = simsimd.cdist(matrix1, matrix2, metric="cosine", threads=0)

Hardware Backend Capabilities

To view a list of hardware backends that SimSIMD supports:

print(simsimd.get_capabilities())

Using Python API with USearch

Want to use it in Python with USearch? You can wrap the raw C function pointers SimSIMD backends into a CompiledMetric, and pass it to USearch, similar to how it handles Numba's JIT-compiled code.

from usearch.index import Index, CompiledMetric, MetricKind, MetricSignature
from simsimd import pointer_to_sqeuclidean, pointer_to_cosine, pointer_to_inner

metric = CompiledMetric(
    pointer=pointer_to_cosine("f16"),
    kind=MetricKind.Cos,
    signature=MetricSignature.ArrayArraySize,
)

index = Index(256, metric=metric)

Using SimSIMD in C

If you're aiming to utilize the _Float16 functionality with SimSIMD, ensure your development environment is compatible with C 11. For other functionalities of SimSIMD, C 99 compatibility will suffice.

For integration within a CMake-based project, add the following segment to your CMakeLists.txt:

FetchContent_Declare(
    simsimd
    GIT_REPOSITORY https://github.com/ashvardanian/simsimd.git
    GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(simsimd)
include_directories(${simsimd_SOURCE_DIR}/include)

Stay updated with the latest advancements by always using the most recent compiler available for your platform. This ensures that you benefit from the newest intrinsics.

Should you wish to integrate SimSIMD within USearch, simply compile USearch with the flag USEARCH_USE_SIMSIMD=1. Notably, this is the default setting on the majority of platforms.

Upcoming Features

Here's a glance at the exciting developments on our horizon:

  • Exposing Hamming and Tanimoto bitwise distances to the Python interface.
  • Intel AMX backend. Note: Currently, the intrinsics are functional only with Intel's latest compiler.

To Rerun Experiments utilize the following command:

cmake -DCMAKE_BUILD_TYPE=Release -DSIMSIMD_BUILD_BENCHMARKS=1 -B ./build_release && make -C ./build_release && ./build_release/simsimd_bench

To Test with PyTest:

pip install -e . && pytest python/test.py -s -x

Project details


Release history Release notifications | RSS feed

This version

2.0.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simsimd-2.0.4-cp311-cp311-manylinux_2_28_x86_64.whl (179.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

simsimd-2.0.4-cp311-cp311-manylinux_2_28_aarch64.whl (161.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

simsimd-2.0.4-cp311-cp311-macosx_11_0_arm64.whl (21.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

simsimd-2.0.4-cp311-cp311-macosx_10_9_x86_64.whl (22.0 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

simsimd-2.0.4-cp310-cp310-manylinux_2_28_x86_64.whl (179.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

simsimd-2.0.4-cp310-cp310-manylinux_2_28_aarch64.whl (161.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

simsimd-2.0.4-cp310-cp310-macosx_11_0_arm64.whl (21.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

simsimd-2.0.4-cp310-cp310-macosx_10_9_x86_64.whl (22.0 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

simsimd-2.0.4-cp39-cp39-manylinux_2_28_x86_64.whl (179.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

simsimd-2.0.4-cp39-cp39-manylinux_2_28_aarch64.whl (161.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ ARM64

simsimd-2.0.4-cp39-cp39-macosx_11_0_arm64.whl (21.8 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

simsimd-2.0.4-cp39-cp39-macosx_10_9_x86_64.whl (22.0 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file simsimd-2.0.4-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 179563cb033d3da559bc0b843fba4e52499cbe8cc7cfb2196fb8c38941635cf1
MD5 7d713b78acc7dd2643b4de611f050ab7
BLAKE2b-256 27871fe2b77a1d941589f415e6d500beb5191474c59600bcb0141ddfe9c22159

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 08ff70b01afa85679e664ba6d07d7a91db0a310dd870f35f2feabb1bd09f97fa
MD5 353498b8b0480682c643de1e5c253723
BLAKE2b-256 7633c32849e6e2cd04151ad322c1d7382a0f2eea2a4dc2191069312751f1c039

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dbcc5dc5849a4033edb50e15232192bf5fd595f46055bb7069b2c974c5eaa9b7
MD5 6c0b4e838052c38c69113baad808aad1
BLAKE2b-256 06768e6ffebb78b3c09df60dce1ac07729c0467d8714f2b07b8e8107512c6141

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fe797718cbc1e87336533006dbe4b3ac74891034b5080a28a32ebf735e9ced2f
MD5 0b683849f53be87eedeef42222bc51e2
BLAKE2b-256 7c78b43b9050a94dd2ebc23d03169a27c18d6d53a3fc43181c7103dc34ddc783

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8ac08736cd83bffe11370e895c421a5e5d43490357a5a0108ea37c0a0bacf847
MD5 ef9811bf837947e29068c4f9c8bd7fe1
BLAKE2b-256 0476a6e3584824eb1db22288a4bd1f648cec65553b5b099b4dd9bcbcbf76d4fe

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3c7db4243c2cb18ab82d2ab2b6adef6df75dd7dc24fcbd3eef5719a0a82ca961
MD5 95f487108c05ce929734603c71920750
BLAKE2b-256 23cf08657cf704315e8a9ada957b52825d547bd608b1b993cb0e8a63d4e3520c

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2c313c4989ad5fa8aed69044dd8921c936ccf3f116367e4148fa70e0f0439243
MD5 83128708027404ac528234eb43394c1a
BLAKE2b-256 4db9b859a950282581cccabd597ed1da9617e829969d73dca6e74d573cf9f143

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9a6bce5f8c2f6a83aebc1c235831b10b56c750b12604ef944e418f0f01cf0f86
MD5 3e735aaff3122b7b402b8b3ce3733850
BLAKE2b-256 31f9217027180c64b60951346d63998c07f228031b8595e5ec8026489fced6fa

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a75cf579b6e3aa9eaf013225b5d6961f48bb0eb3a0cee006b83e3cffbf14a8bf
MD5 d77bdaa6086e615fd78f747ef01dbd3d
BLAKE2b-256 8b6c96b6b0dc1ce5db9195bb9fab51c33f8b9df5219aba0e3f8c4485b6eb2a8e

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3a2c9ca06f06463c81b3a7599a05c54ae02a3ea294b0703bf01f86899f90fff7
MD5 e07f5dfe3a41b8631001caa765396495
BLAKE2b-256 01199662b191c47a24f6b510e46ec7d42ffa249bc6e1663555aa00015d0aa368

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f1bee7edaf09f75af7423525b6f50763f9955186b9f982c6be4ab464fc36bd5
MD5 70f5b54224d9e82fe1a88c3291cce92d
BLAKE2b-256 e941cde651984b5d5603b22430a548cdbc3b79af8bf0e29c9a11d42acc246de1

See more details on using hashes here.

File details

Details for the file simsimd-2.0.4-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for simsimd-2.0.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6e1298ecd5abc31e5a7271e5b38fe5dbfced7bc8a2e6c4bed911d153bc30c117
MD5 695a305cbac1bbda90f562ef48c9cc4a
BLAKE2b-256 34c4b900230439af6250f56aced72aa84d20adbbb1ee3941461bb4dee67b4dc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page