Fastest SIMD-Accelerated Vector Similarity Functions for x86 and Arm
Project description
SimSIMD 📏
Hardware-Accelerated Similarity Metrics and Distance Functions
- Zero-dependency header-only C 99 library with bindings for Python and JavaSctipt.
- Targets ARM NEON, SVE, x86 AVX2, AVX-512 (VNNI, FP16) hardware backends.
- Handles single-precision
f32
, half-precisionf16
,i8
, and binary vectors. - Up to 200x faster than
scipy.spatial.distance
andnumpy.inner
. - Compatible with NumPy, PyTorch, TensorFlow, and other tensors.
- Used in USearch and several DBMS products.
Implemented distance functions include:
- Euclidean (L2), Inner Product, and Cosine (Angular) spatial distances.
- Hamming (~ Manhattan) and Jaccard (~ Tanimoto) binary distances.
- Kullback-Leibler and Jensen–Shannon divergences for probability distributions.
Technical Insights and related articles:
- Uses Horner's method for polynomial approximations, beating GCC 12 by 119x.
- Uses Arm SVE and x86 AVX-512's masked loads to eliminate tail
for
-loops. - Uses AVX-512 FP16 for half-precision operations, that few compilers vectorize.
- Substitutes LibC's
sqrt
calls with bithacks using Jan Kadlec's constant. - For Python avoids slow PyBind11, SWIG, and even
PyArg_ParseTuple
for speed. - For JavaScript uses typed arrays and NAPI for zero-copy calls.
Benchmarks
Apple M2 Pro
Given 1000 embeddings from OpenAI Ada API with 1536 dimensions, running on the Apple M2 Pro Arm CPU with NEON support, here's how SimSIMD performs against conventional methods:
Kind | f32 improvement |
f16 improvement |
i8 improvement |
Conventional method | SimSIMD |
---|---|---|---|---|---|
Cosine | 32 x | 79 x | 133 x | scipy.spatial.distance.cosine |
cosine |
Euclidean ² | 5 x | 26 x | 17 x | scipy.spatial.distance.sqeuclidean |
sqeuclidean |
Inner Product | 2 x | 9 x | 18 x | numpy.inner |
inner |
Jensen Shannon | 31 x | 53 x | scipy.spatial.distance.jensenshannon |
jensenshannon |
Intel Sapphire Rapids
On the Intel Sapphire Rapids platform, SimSIMD was benchmarked against auto-vectorized code using GCC 12. GCC handles single-precision float
, but might not be the best choice for int8
and _Float16
arrays, which has been part of the C language since 2011.
Kind | GCC 12 f32 |
GCC 12 f16 |
SimSIMD f16 |
f16 improvement |
---|---|---|---|---|
Cosine | 3.28 M/s | 336.29 k/s | 6.88 M/s | 20 x |
Euclidean ² | 4.62 M/s | 147.25 k/s | 5.32 M/s | 36 x |
Inner Product | 3.81 M/s | 192.02 k/s | 5.99 M/s | 31 x |
Jensen Shannon | 1.18 M/s | 18.13 k/s | 2.14 M/s | 118 x |
Broader Benchmarking Results:
Using SimSIMD in Python
Installation
pip install simsimd
Distance Between 2 Vectors
import simsimd
import numpy as np
vec1 = np.random.randn(1536).astype(np.float32)
vec2 = np.random.randn(1536).astype(np.float32)
dist = simsimd.cosine(vec1, vec2)
Supported functions include cosine
, inner
, sqeuclidean
, hamming
, and jaccard
.
Distance Between 2 Batches
batch1 = np.random.randn(100, 1536).astype(np.float32)
batch2 = np.random.randn(100, 1536).astype(np.float32)
dist = simsimd.cosine(batch1, batch2)
If either batch has more than one vector, the other batch must have one or same number of vectors. If it contains just one, the value is broadcasted.
All Pairwise Distances
For calculating distances between all possible pairs of rows across two matrices (akin to scipy.spatial.distance.cdist
):
matrix1 = np.random.randn(1000, 1536).astype(np.float32)
matrix2 = np.random.randn(10, 1536).astype(np.float32)
distances = simsimd.cdist(matrix1, matrix2, metric="cosine")
Multithreading
By default, computations use a single CPU core. To optimize and utilize all CPU cores on Linux systems, add the threads=0
argument. Alternatively, specify a custom number of threads:
distances = simsimd.cdist(matrix1, matrix2, metric="cosine", threads=0)
Hardware Backend Capabilities
To view a list of hardware backends that SimSIMD supports:
print(simsimd.get_capabilities())
Using Python API with USearch
Want to use it in Python with USearch?
You can wrap the raw C function pointers SimSIMD backends into a CompiledMetric
, and pass it to USearch, similar to how it handles Numba's JIT-compiled code.
from usearch.index import Index, CompiledMetric, MetricKind, MetricSignature
from simsimd import pointer_to_sqeuclidean, pointer_to_cosine, pointer_to_inner
metric = CompiledMetric(
pointer=pointer_to_cosine("f16"),
kind=MetricKind.Cos,
signature=MetricSignature.ArrayArraySize,
)
index = Index(256, metric=metric)
Using SimSIMD in JavaScript
After you add simsimd
as a dependency and npm install
, you will be able to call SimSIMD function on various TypedArray
variants:
const { sqeuclidean, cosine, inner, hamming, jaccard } = require('simsimd');
const vectorA = new Float32Array([1.0, 2.0, 3.0]);
const vectorB = new Float32Array([4.0, 5.0, 6.0]);
const distance = sqeuclidean(vectorA, vectorB);
console.log('Squared Euclidean Distance:', distance);
Using SimSIMD in C
If you're aiming to utilize the _Float16
functionality with SimSIMD, ensure your development environment is compatible with C 11. For other functionalities of SimSIMD, C 99 compatibility will suffice.
For integration within a CMake-based project, add the following segment to your CMakeLists.txt
:
FetchContent_Declare(
simsimd
GIT_REPOSITORY https://github.com/ashvardanian/simsimd.git
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(simsimd)
include_directories(${simsimd_SOURCE_DIR}/include)
Stay updated with the latest advancements by always using the most recent compiler available for your platform. This ensures that you benefit from the newest intrinsics.
Should you wish to integrate SimSIMD within USearch, simply compile USearch with the flag USEARCH_USE_SIMSIMD=1
. Notably, this is the default setting on the majority of platforms.
Benchmarking and Contributing
To rerun experiments utilize the following command:
cmake -DCMAKE_BUILD_TYPE=Release -DSIMSIMD_BUILD_BENCHMARKS=1 -B ./build_release
cmake --build build_release --config Release
./build_release/simsimd_bench
./build_release/simsimd_bench --benchmark_filter=js
To test and benchmark with Python bindings:
pip install -e .
pytest python/test.py -s -x
pip install numpy scipy scikit-learn # for comparison baselines
python python/bench.py # to run default benchmarks
python python/bench.py --n 1000 --ndim 1000000 # batch size and dimensions
To test and benchmark JavaScript bindings:
npm install --dev
npm test
npm run bench
To test and benchmark GoLang bindings:
cd golang
go test # To test
go test -run=^$ -bench=. -benchmem # To benchmark
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for simsimd-3.6.1-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73c5b726ac2d3aecaae53a459b0342e5cd31c8c4c58e5f15fbc69b7cf09b5dc0 |
|
MD5 | 838b28df2c4f91ac2415fb9fba186f55 |
|
BLAKE2b-256 | 9e36765d46d309c8ae70e158941c253a1e48c58b07dab7c6cd04eabc25a9b372 |
Hashes for simsimd-3.6.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3427993d10643aaf28f832b3d26292a3da58541d99046849e9a4c5a20b568fbf |
|
MD5 | 85eb12af9d7f9d5cffbd5163ddf77a92 |
|
BLAKE2b-256 | 07a08535a9a239b9c237f0acf36d61b42f97258286887e2ab99a138155c7bd5e |
Hashes for simsimd-3.6.1-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06ac844e1dcc6ade048c61e6f7c80e980f60dcd0ddebf0e7a960b08ef9ba4932 |
|
MD5 | 448b45d1b3d20d0ab68be36b1d9ffb2e |
|
BLAKE2b-256 | 1347195f2e55b040b0dd5c7ad77bddf7dfaf02eea79616faa77367d56f911924 |
Hashes for simsimd-3.6.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ec28aa5429613c81793dc225fb2eb4801a8e3388753d06a56a59bfc8ae890ef |
|
MD5 | 0f3f6924a8215b69bb2ed480a07ed56f |
|
BLAKE2b-256 | 74538f7be779e20a01cf464c5c87317352992001ba58dee6ebec95700a26e094 |
Hashes for simsimd-3.6.1-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2560368e7386debf9b21f19ce3f56e0d4daf94ec22caa75e75e06867175cdbf4 |
|
MD5 | f8230db6fe478776b4b083e5c477577a |
|
BLAKE2b-256 | a351136834b7074b3f1e11679d0b17d6cb75ed40b9bf1c8bd3a91bccdef376c7 |
Hashes for simsimd-3.6.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a0c27eeef6e079c216b027b75d184d21400573fd3749f78f9025341d88cab23 |
|
MD5 | 4e94379dded67b514acb904d5da803a6 |
|
BLAKE2b-256 | c063385977fb18c20e97a0354c73a6b92f1410d9c18dbc9946a4774b72973773 |
Hashes for simsimd-3.6.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37248f8b76b8637bff2ee362162ece39f0212c14d3bab22bd6b871fa0aef63a4 |
|
MD5 | ae49d310badff9e769d5ba371afcafa7 |
|
BLAKE2b-256 | 08de3851e5d50736defde359bcadf5c50b4656abc1f5a43ffd6b40a90717a4c6 |
Hashes for simsimd-3.6.1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9715c615cb9707f37000b32000b8b426acee2c82da6d2ba3c3298be0cc828273 |
|
MD5 | c9fe7aa6e9753e96f143c0980cdec640 |
|
BLAKE2b-256 | 35dacdaba41f40a3eb6573d96a5050161c108e0e1fa5be85a49e140301d8ced8 |
Hashes for simsimd-3.6.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b7f470762c7fc36a05011784f48c419d5edb88d138e5585c2cfa00fb0698c9f |
|
MD5 | edd6862b3cbdbe0578567215c3841e46 |
|
BLAKE2b-256 | a1366901e2961e5d34a0e4b7e39e4f21cf8df6c568c3c1f71ba69f7834309201 |
Hashes for simsimd-3.6.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc9ac01d993ef01878a35e1d08c53889bf4e7f550abfa82ed300a0a6c80c16ff |
|
MD5 | e8c4c2ad633db6621392d2969c98987d |
|
BLAKE2b-256 | f9e1ab4d1bdad5cd5790d456d27bab361d5a7d8ee5cdd31b1695a4d3d0d525db |
Hashes for simsimd-3.6.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 639670684fa9263ae8a4404b5cb6c5e4ad671eb53ae6e9297393ee81d1b76c63 |
|
MD5 | 2364936842667b176c71aeb3e8bd98a1 |
|
BLAKE2b-256 | ff7f7bf0b1a04afe911d21a0bc32c5a48b5753fdc2b78698fac438ad892b7d98 |
Hashes for simsimd-3.6.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa40cb9d86c96b84a6aacfb44c9cfd6808718862c07c5b4331ff00d064db51dd |
|
MD5 | 4dc9178cfb9d423187fc97983bf21af6 |
|
BLAKE2b-256 | 99fb52552dcc10f21fc96ca50c07ec3a05f1cd1485b8a2dc4b4bc87a74af8f9b |
Hashes for simsimd-3.6.1-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64c30ecc0df4d5db8488d94726810301427a0cb0d6dad8b48d2ddf23b15d5ce7 |
|
MD5 | 529b501487ee2868325fc3537b0d71af |
|
BLAKE2b-256 | b5b47081e67165d8f6d1b1aa2fcb8267505f482b061044377094ef82c7959619 |
Hashes for simsimd-3.6.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c485db15b3df6e7e089d6a9e554eca9d081f977ff499365a9181dbd53ea29b75 |
|
MD5 | 0e6647bcb128b7472bc638df16662778 |
|
BLAKE2b-256 | e7eee5aaf1fb3db0067fe4e6ca7fe6d0fe65da1d0e67fd78e04bf18060033a90 |
Hashes for simsimd-3.6.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12617eba88fc8c35457f99e507753a3f0465370f80f1b93935f1a0fc2c4a40c7 |
|
MD5 | d80db98099901c8347f3e3e41b0bfac7 |
|
BLAKE2b-256 | e2c39bb18ba0a13f99ba7fdfcfbcfa2f6010c55dc55bc5117c18a578c2b10c47 |
Hashes for simsimd-3.6.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab125572cd8452a0dc3d30cd045a64196c677d03518c847a9ef1932e2f3aa829 |
|
MD5 | 4131b31486b3042146eab290ddf52e60 |
|
BLAKE2b-256 | ee2c14c34ce6056204b1a74f8c65b963abe31e1eb100f515544b2b3320036e0d |
Hashes for simsimd-3.6.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb1c3e56eac21645773f5e3fdacd25875cf0c856c3a140b088d73b5ed31774b5 |
|
MD5 | 807825058c0cf65ddf274d7b794e193f |
|
BLAKE2b-256 | 0ef0d7ef9581aa0335d955cbc2bb1d7005847c42ef99d1c307a914d08207948c |
Hashes for simsimd-3.6.1-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a900e6d988acaec67bb153bdf38b3c0f80de4834d1ddb2f49573510b2102a64 |
|
MD5 | 005991fdaefe3a62383be95f5cbc16c5 |
|
BLAKE2b-256 | 5c694f9656316c7b70bf33399ebd50f053f89b37314ec5165f64807cc28fc8db |
Hashes for simsimd-3.6.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 008a3a2b7127ccc272377bd7e96c4112580ea04d03f48470529ab29e74247e8c |
|
MD5 | c8b9a3841148929fc5283c6dbd421457 |
|
BLAKE2b-256 | b90c0a8d36703ec5584f0801ab43db7cc3528d62d55b15f37d96a8158aefa724 |
Hashes for simsimd-3.6.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 280dd94f6d02f3ed58134910d1058232cd0947f4c500e48b3bc5ad4caebd3074 |
|
MD5 | d3f32df3b052f49c9bf66fec995af778 |
|
BLAKE2b-256 | d14183858162ed9a2713da90fe8c2603232c224f89412981bd3f8e98a750bfcd |
Hashes for simsimd-3.6.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0db6d9be5828a2479160a98285aa31bb4552faf2b2706b708e78b9a67e8f140c |
|
MD5 | bf62e4712063799a38f6a9c040643321 |
|
BLAKE2b-256 | 91219d17f9abb062b751fe0bb903c72441cee76ab5349f4b41ff9cd68e4dcf4d |
Hashes for simsimd-3.6.1-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91c5b57ac33a244166a8a2817f6c9cf2c333b41e8825948fc6f5c330bda16f30 |
|
MD5 | 9288b8c95f20c665f0087f392d9fe954 |
|
BLAKE2b-256 | b45fe008d685625cb800dd4ee495410c42997b5121497c522a49ac162a025a8a |
Hashes for simsimd-3.6.1-cp38-cp38-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fe99ebe1df92eb66eb2602c4a315de44ea227e62b00b6dd0a6738e73d2e7cda |
|
MD5 | f1d1f432d6b0d46cad81cd2a51b44cac |
|
BLAKE2b-256 | f7886e10ea92ad93473b71cf57eefb9778c3d958c635e42b75816ef6a12f9180 |
Hashes for simsimd-3.6.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef240ce0ba0b5f706ec32d3c2b29813f386311be1ac3cf323818391af64be043 |
|
MD5 | 3950e1fefda654228af2f461584d1c26 |
|
BLAKE2b-256 | 7f971550400f0d3cc6341f2a7bf48d9fde384365162a112016c036b613e69a3d |
Hashes for simsimd-3.6.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffffb42c31f61372a43c67187d6942282e1ee686eaec9b47bbe87f2a5722b830 |
|
MD5 | 4f926fa7fb6a6e81d3fe71f78a4d3640 |
|
BLAKE2b-256 | 3e7c21c820db1781b48fc209a7265650365bf17804cb55839b4bea6b45411ae5 |
Hashes for simsimd-3.6.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ad171a8ed1899cd4266bc7ec2ed58a5b3003578b3a901ada08dacd18fc5ddc3 |
|
MD5 | c4bf8f157af804b8db399697815d9693 |
|
BLAKE2b-256 | defebd4d92f16e960c9117daf8bec946af2333363d19be54a6834a1f1743ed72 |
Hashes for simsimd-3.6.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce878e460dd6406d3492fcdfa73b413dd139139d44003c4d8a260eca031dd6d3 |
|
MD5 | 93b8dd28e5a1595f4720f198d3b02f34 |
|
BLAKE2b-256 | 289c38700c83cedd76f8bae703b6f4fcc7bfc783847a813424f23126244a93ab |