SIMD-accelerated similarity measures for x86 and Arm
Project description
SimSIMD 📏
SIMD-accelerated similarity measures, metrics, and distance functions for x86 and Arm. They are tuned for Machine Learning applications and mid-size vectors with 100-1024 dimensions. One can expect the following performance for Cosine (Angular) distance, the most common metric in AI.
Method | Vectors | Any Length | Speed on 256b | Speed on 1024b |
---|---|---|---|---|
Serial | f32 |
✅ | 5 GB/s | 5 GB/s |
SVE | f32 |
✅ | 34 GB/s | 40 GB/s |
SVE | f16 |
✅ | 28 GB/s | 35 GB/s |
NEON | f16 |
❌ | 16 GB/s | 18 GB/s |
The benchmarks were done on Arm-based "Graviton 3" CPUs powering AWS
c7g.metal
instances. We only use Arm NEON implementation with vector lengths multiples of 128 bits, avoiding additional head or tailfor
loops for misaligned data. By default, we use GCC12,-O3
,-march=native
for benchmarks. Serial versions imply auto-vectorization pragmas.
Need something like this in your CMake-based project?
FetchContent_Declare(
simsimd
GIT_REPOSITORY https://github.com/ashvardanian/simsimd.git
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(simsimd)
include_directories(${simsimd_SOURCE_DIR}/include)
Want to use it in Python with USearch?
from usearch import Index, CompiledMetric, MetricKind, MetricSignature
from simsimd import to_int, cos_f32x4_neon
metric = CompiledMetric(
pointer=to_int(cos_f32x4_neon),
kind=MetricKind.Cos,
signature=MetricSignature.ArrayArraySize,
)
index = Index(256, metric=metric)
Available Metrics
In the C99 interface, all functions are prepended with the simsimd_
namespace prefix.
The signature defines the number of arguments:
- two pointers, and length,
- two pointers.
The latter is intended for cases where the number of dimensions is hard-coded. Constraints define the limitations on the number of dimensions an argument vector can have.
Name | Signature | ISA Extension | Constraints |
---|---|---|---|
dot_f32_sve |
✳️✳️📏 | Arm SVE | |
dot_f32x4_neon |
✳️✳️📏 | Arm NEON | d % 4 == 0 |
cos_f32_sve |
✳️✳️📏 | Arm SVE | |
cos_f16_sve |
✳️✳️📏 | Arm SVE | |
cos_f16x4_neon |
✳️✳️📏 | Arm NEON | d % 4 == 0 |
cos_i8x16_neon |
✳️✳️📏 | Arm NEON | d % 16 == 0 |
cos_f32x4_neon |
✳️✳️📏 | Arm NEON | d % 4 == 0 |
cos_f16x16_avx512 |
✳️✳️📏 | x86 AVX-512 | d % 16 == 0 |
cos_f32x4_avx2 |
✳️✳️📏 | x86 AVX2 | d % 4 == 0 |
l2sq_f32_sve |
✳️✳️📏 | Arm SVE | |
l2sq_f16_sve |
✳️✳️📏 | Arm SVE | |
hamming_b1x8_sve |
✳️✳️📏 | Arm SVE | d % 8 == 0 |
hamming_b1x128_sve |
✳️✳️📏 | Arm SVE | d % 128 == 0 |
hamming_b1x128_avx512 |
✳️✳️📏 | x86 AVX-512 | d % 128 == 0 |
tanimoto_b1x8_naive |
✳️✳️📏 | d % 8 == 0 |
|
tanimoto_maccs_naive |
✳️✳️ | d == 166 |
|
tanimoto_maccs_neon |
✳️✳️ | Arm NEON | d == 166 |
tanimoto_maccs_sve |
✳️✳️ | Arm SVE | d == 166 |
tanimoto_maccs_avx512 |
✳️✳️ | x86 AVX-512 | d == 166 |
Benchmarks
The benchmarks are repeated for every function with a different number of cores involved.
Light-weight distance functions would be memory bound, implying that multi-core performance may be lower if the bus bandwidth cannot saturate all the cores.
Similarly, heavy-weight distance functions running on all cores may result in CPU frequency downclocking.
This is well illustrated by the single-core performance of the Intel i9-13950HX
, equipped with DDR5 memory.
Method | Threads | Vector Size | Speed |
---|---|---|---|
dot_f32x4_avx2 |
1 | 1024 b | 96.2 GB/s |
dot_f32x4_avx2 |
32 | 1024 b | 23.6 GB/s |
cos_f32_naive |
1 | 1024 b | 15.3 GB/s |
cos_f32_naive |
32 | 1024 b | 4.5 GB/s |
cos_f32x4_avx2 |
1 | 1024 b | 56.3 GB/s |
cos_f32x4_avx2 |
32 | 1024 b | 13.9 GB/s |
tanimoto_maccs_naive |
1 | 21 b | 2.8 GB/s |
tanimoto_maccs_naive |
32 | 21 b | 1.2 GB/s |
Switching to the Intel Sapphire Rapids server platform, we can also evaluate some of the AVX-512 extensions, including VPOPCNTDQ
and F16
.
Method | Threads | Vector Size | Speed |
---|---|---|---|
dot_f32x4_avx2 |
1 | 1024 b | 57.8 GB/s |
dot_f32x4_avx2 |
224 | 1024 b | 16.1 GB/s |
cos_f32_naive |
1 | 1024 b | 10.7 GB/s |
cos_f32_naive |
224 | 1024 b | 3.0 GB/s |
cos_f32x4_avx2 |
1 | 1024 b | 39.5 GB/s |
cos_f32x4_avx2 |
224 | 1024 b | 15.1 GB/s |
cos_f16x16_avx512 |
1 | 1024 b | 50.6 GB/s |
cos_f16x16_avx512 |
224 | 1024 b | 15.9 GB/s |
hamming_b1x128_avx512 |
1 | 1024 b | 790.3 GB/s |
hamming_b1x128_avx512 |
224 | 1024 b | 259.3 GB/s |
tanimoto_maccs_naive |
1 | 21 b | 3.0 GB/s |
tanimoto_maccs_naive |
224 | 21 b | 1.3 GB/s |
tanimoto_maccs_avx512 |
1 | 21 b | 13.1 GB/s |
tanimoto_maccs_avx512 |
224 | 21 b | 3.7 GB/s |
To replicate this on your hardware, please run the following on Linux:
git clone https://github.com/ashvardanian/SimSIMD.git && cd SimSIMD
cmake -DCMAKE_BUILD_TYPE=Release -DSIMSIMD_BUILD_BENCHMARKS=1 \
-DCMAKE_CXX_COMPILER="g++-12" -DCMAKE_C_COMPILER="gcc-12" \
-B ./build && make -C ./build && ./build/simsimd_bench
MacOS:
brew install llvm
git clone https://github.com/ashvardanian/SimSIMD.git && cd SimSIMD
cmake -B ./build \
-DCMAKE_C_COMPILER="/opt/homebrew/opt/llvm/bin/clang" \
-DCMAKE_CXX_COMPILER="/opt/homebrew/opt/llvm/bin/clang++" \
-DSIMSIMD_BUILD_BENCHMARKS=1 \
&& \
make -C ./build -j && ./build/simsimd_bench
Install and test locally:
pip install -e . && pytest python/test.py -s -x
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for simsimd-1.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c46c39d2989fdeca7e48f0c806f2d0c7de7a7369479334b80570bad086d31abe |
|
MD5 | f4b5926bb10c192f3b619a654dc9e753 |
|
BLAKE2b-256 | 7f14ca0fed9833bd8b742c0f46c5c3934c186cf05e16b166d21362ae585a9435 |
Hashes for simsimd-1.4.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 460a9abc11303951a909051fd05c0b5713be6e44e0b3f3d0c743906132e88732 |
|
MD5 | 7198c61d6a8decded5dfbe81518dabd9 |
|
BLAKE2b-256 | a69eebb3598fd1ee912c14ba2874ad27e0959c53ed6b0abd584bfe0938299726 |
Hashes for simsimd-1.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 181a2633fcfb9e1c5808ea6753070dc71636bb475a8dd6dfcdc4250812674834 |
|
MD5 | c2573cd1b7fd862bf666ecd840e83168 |
|
BLAKE2b-256 | 64cd0291c3f2a4e844679587f3a2b13b4e74affeefc8e97670793983ce50e8cb |
Hashes for simsimd-1.4.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a21f1c359051d82ee7ea33a672a88c43eba512572ef95a71b2799b50f2fb3491 |
|
MD5 | 3bee9cc94e6d3e201519a099d183884a |
|
BLAKE2b-256 | f0049b34f50c68d09e83e77d746ac7e6fc9d6cf7440bf42288c827414eb132e2 |
Hashes for simsimd-1.4.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0415ce377bb7b3bdd54c536815cce6c57443f9dfbf624bb06cc215f80e408850 |
|
MD5 | b0e09f478144f72e88af5e23b992d89c |
|
BLAKE2b-256 | a5d55c10de646f98b9f032449cbabc2eba1bdffd5d410ada60e3cd08d7ee0ef4 |
Hashes for simsimd-1.4.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f87ac08186536f2b3d854fb9e8b9c877662085cd6753a11f49ec993d0e7ed0bf |
|
MD5 | db2ad1ad83953394a1a89e24cb9dda8f |
|
BLAKE2b-256 | 9c5edff35e629b764cda0a5cf488a9348df187473eb896f2d9e58c19de913805 |
Hashes for simsimd-1.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ff58c88cc0d44388a15bfd248a7925fd7c6577ea00226acd3f807ec98176b94 |
|
MD5 | 9683bf7eff6ad0db6d300957b04fbd8f |
|
BLAKE2b-256 | 90e341ce2b9ac112a8917db6a3ad696529d8424f8e62f56836647ed5469b6452 |
Hashes for simsimd-1.4.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7c2e6649a4f52acd2a2ae6532262ce73034b9cc9d39bafe48fe2a59ae59cb23 |
|
MD5 | e11671dd30e4e93fb923525b3685bd90 |
|
BLAKE2b-256 | 9c3952be63c926e73da29f372af3f7dc37d8b15f077146b9d3b8ec32bd8993a8 |
Hashes for simsimd-1.4.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9fcacb0faa0abbda5197101f857dbf29696de31bc336192819c5214fbe16ed3 |
|
MD5 | 8089e1f8700ea6cde0d95c0cca0ad341 |
|
BLAKE2b-256 | b7ffc62a72fb1f414fdd019f78a7116a281dad568d92e624a0c8c1ca4bf50577 |
Hashes for simsimd-1.4.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62814e9529db95788dbfb0a00d1957fd5e7b148bea7c085ab5987732282dc848 |
|
MD5 | 59dedcedb442c8482c781bc6564b94ad |
|
BLAKE2b-256 | 7ecb2f0fd3a863eeaba3fc8814d60db1d11dc2f60480815a81c19a14ef8d0a31 |
Hashes for simsimd-1.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b3250923cb1062b6df2f1fb04a6ac510b20311ab8cbdaaab6fface33c04d2f2 |
|
MD5 | 423b743f6a88548912048e474ee6bc02 |
|
BLAKE2b-256 | 1fe9b75d46ea5feca2fb94216d47c7e059135860e06e3728ece445e206e92903 |
Hashes for simsimd-1.4.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43413ddeb5d3e49c0b380d87bf7c98b70f54e5d58342d053f2d8b197d08e6808 |
|
MD5 | da6b8ea3ecf618046874a91cfdea0e6c |
|
BLAKE2b-256 | db0b05e11c0be7e276f812369dc5c098a4759e5a1db7da2d6aa6a3f3ee7a66b6 |
Hashes for simsimd-1.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ad7ca3dae958b30102b0952186fa80d81241ca389fb4ea7403c1aab94259215 |
|
MD5 | d471f7dc4bb613a095d6676bf078ed8a |
|
BLAKE2b-256 | 8c2e6bfddb35a7d27c326adb3dc2f4e1098ed6c0b0fe68dd3730b6a66ad8d251 |
Hashes for simsimd-1.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35e40e81fa10645e1660d3ff1c5cfd081e057312be62564326b29999cc4dc7b2 |
|
MD5 | e8d8a2a96078d915b42aa128f47ee22b |
|
BLAKE2b-256 | 172919d55ecebd6866acb85b1c7a52252ccc934f2cc4d6ac2355f307054c60e3 |
Hashes for simsimd-1.4.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 409c1236382a75ad6c32123c249cda310d7fef30878a7e35c1202ea973c05d0d |
|
MD5 | d8e5892e18ae90e8da2b391da4373272 |
|
BLAKE2b-256 | 5a69d556c34289f915d8f44e0b31feba183e17ab83b814d6d8c0899dbd00a070 |
Hashes for simsimd-1.4.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdbf40e63455396cf77b9c3f8a03252b3d2107d5100b714f4f5f73f794233f85 |
|
MD5 | 3a239323c916dbc760e8bef0c3080060 |
|
BLAKE2b-256 | f072cbf33795d20cbbf1a0f5275ae6b3f83eb240eb1a57cd2ab79be4f71332b2 |
Hashes for simsimd-1.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 772da3afcd436f706f868cefeeb7c5fd1a038f43cb80fe4a9cd2cc53c868bd9a |
|
MD5 | a35e066f3110ad243d2c6aba332e6273 |
|
BLAKE2b-256 | d0d5dfd4ac989a1c3438fa9c037baf9c68151500de1291063b07aebc3bca3775 |
Hashes for simsimd-1.4.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31f77b97243a6118842045ec814f27bcc1006bda34152473c4e54cd94ed56e6a |
|
MD5 | 2d52652a67184bfe3852cf72691e77d8 |
|
BLAKE2b-256 | bdedc8bd7b1cb7d927b11ff15c73d6b34eef596d89573c64f3512b57dbdf6830 |
Hashes for simsimd-1.4.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21288876ac8804c0cf3623857b0eae199c511508fb4b0b0c40978036db11f6d9 |
|
MD5 | 7a3b3e37b907b6a20dceba6d863caff3 |
|
BLAKE2b-256 | c27e75e834fbc9007deae78fe7fe21ba1c6332c6bee496fe05a99b6ed108f859 |
Hashes for simsimd-1.4.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94d050535f93bf055c9d75e1ef6d193b02c7d6601a494a985bd3cfb9b2a598e9 |
|
MD5 | 9e0e4b80436bbfea72d4c46be47b6d3f |
|
BLAKE2b-256 | ebffdb7ab2bae42a56d2d842d5fab5ad98af2475b6e0a771214efbab76d2fc21 |
Hashes for simsimd-1.4.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35aeb6bd83aa889947ef8e5cbae40649bd2bc3553961df2d1cc060c757463d93 |
|
MD5 | 746e37c1420bf8b979d72cfd8c57231d |
|
BLAKE2b-256 | a0f72d5a73b878cae53f42c04623cdc2d175ac7c244bb06796d999559f3df184 |
Hashes for simsimd-1.4.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ff14718b1459e46ab0b16c7611219fa7f81086a851eb06f2492d177b6b3e8b9 |
|
MD5 | 3dd4ba7e32449c35141bec2a4e2b28ee |
|
BLAKE2b-256 | 3db0c3391140f9f0ffc9208987bf4cd4bbebf08e1b2ea9219c703a502948247c |
Hashes for simsimd-1.4.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df4d878d0c5fb27afaa224c82014272f21ef209343bb2bff0491d4a28f3f1488 |
|
MD5 | f4a237025df74eb116292ca5ad8ff652 |
|
BLAKE2b-256 | b7408e4e5f2cd24eeee60da8f3ad7b813af19da69cab41addcdd4e7fad428fe4 |
Hashes for simsimd-1.4.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1135610a5933183091829f2ab0fc3f4dae1ed58031d5fef789b557e5278d984 |
|
MD5 | cdb6464038b1f26b8c5e5d5f30432bf9 |
|
BLAKE2b-256 | 07054ae8f830d9401a537f628c03d416085ba8443fbdf5823a90b18189f106d6 |
Hashes for simsimd-1.4.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41cb55de9a991e4dab1b5c5f6dd12de7527545c0e93af5e964fd8f480e9ae0d1 |
|
MD5 | 541614e081326b62e5246514abd6e382 |
|
BLAKE2b-256 | ee496a3b566d1c0c57b910e0ec8fefdecdae58b76c18e56e48cad92e878a77f3 |
Hashes for simsimd-1.4.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0908b5c0be32633dde4c8841a2d842744408e4660e6e5eeaab8574882b5375e |
|
MD5 | 17b1a9c8b3ef5a2042fe5a0a181a6858 |
|
BLAKE2b-256 | 1e78451a93a6410c2cfc0a1feec06c520da765fe6625a331a619451216bbb21b |