Distance functions: A drop-in replacement for, and a super-set of the scipy.spatial.distance module.
Project description
Algorithms for Big Data: Distances (v1.0.3)
This package contains algorithms for computing distances between data points.
It is a thin Python wrapper around the distances
crate, in Rust.
It provides drop-in replacements for the distance functions in scipy.spatial.distance
.
Supported Distance Functions
Installation
pip install abd-distances
Usage
import math
import numpy
import abd_distances.simd as distance
a = numpy.array([i for i in range(10_000)], dtype=numpy.float32)
b = a + 1.0
dist = distance.euclidean(a, b)
assert math.fabs(dist - 100.0) < 1e-6
print(dist)
# 100.0
Vector Distances
- Bray-Curtis:
abd_distances.vector.braycurtis
- Canberra:
abd_distances.vector.canberra
- Chebyshev:
abd_distances.vector.chebyshev
- Correlation
- Cosine:
abd_distances.vector.cosine
- Euclidean:
abd_distances.vector.euclidean
- Jensen-Shannon
- Mahalanobis
- Manhattan:
abd_distances.vector.manhattan
andabd_distances.vector.cityblock
- Minkowski:
abd_distances.vector.minkowski
- Standardized Euclidean
- Squared Euclidean:
abd_distances.vector.sqeuclidean
- Pairwise Distances:
abd_distances.vector.cdist
andabd_distances.vector.pdist
- ...
Boolean Distances
- Dice
- Hamming
- Jaccard
- Kulczynski 1D
- Rogers-Tanimoto
- Russell-Rao
- Sokal-Michener
- Sokal-Sneath
- Yule
- ...
SIMD-Accelerated Vector Distances
- Euclidean:
abd_distances.simd.euclidean
- Squared Euclidean:
abd_distances.simd.sqeuclidean
- Cosine:
abd_distances.simd.cosine
- Pairwise Distances:
abd_distances.simd.cdist
andabd_distances.simd.pdist
- ...
String Distances
- Hamming:
abd_distances.strings.hamming
- Levenshtein:
abd_distances.strings.levenshtein
- Needleman-Wunsch:
abd_distances.strings.needleman_wunsch
- Smith-Waterman
- Pairwise Distances
- ...
Benchmarks
SIMD-Accelerated Vector Distance Benchmarks
These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.
The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance
.
The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy
functions.
All pairwise distances (cdist
and pdist
) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs.
All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.
Benchmark | Min | Max | Mean | Min (+) | Max (+) | Mean (+) |
---|---|---|---|---|---|---|
cdist, euclidean, f32 | 2.560 | 2.576 | 2.566 | 0.185 (13.9x) | 0.196 (13.2x) | 0.188 (13.7x) |
cdist, euclidean, f64 | 2.398 | 2.406 | 2.401 | 0.292 (8.2x) | 0.307 (7.8x) | 0.298 (8.0x) |
cdist, sqeuclidean, f32 | 2.519 | 2.527 | 2.523 | 0.182 (13.9x) | 0.197 (12.8x) | 0.187 (13.5x) |
cdist, sqeuclidean, f64 | 2.381 | 2.393 | 2.389 | 0.293 (8.1x) | 0.318 (7.5x) | 0.301 (7.9x) |
cdist, cosine, f32 | 4.011 | 4.021 | 4.016 | 0.625 (6.4x) | 0.637 (6.3x) | 0.632 (6.4x) |
cdist, cosine, f64 | 3.978 | 4.009 | 3.992 | 0.626 (6.4x) | 0.666 (6.0x) | 0.638 (6.3x) |
pdist, euclidean, f32 | 1.235 | 1.249 | 1.241 | 0.252 (4.9x) | 0.263 (4.7x) | 0.257 (4.8x) |
pdist, euclidean, f64 | 1.216 | 1.262 | 1.234 | 0.302 (4.0x) | 0.312 (4.0x) | 0.308 (4.0x) |
pdist, sqeuclidean, f32 | 1.229 | 1.250 | 1.237 | 0.251 (4.9x) | 0.303 (4.1x) | 0.265 (4.7x) |
pdist, sqeuclidean, f64 | 1.209 | 1.213 | 1.211 | 0.306 (3.9x) | 0.313 (3.9x) | 0.310 (3.9x) |
pdist, cosine, f32 | 2.001 | 2.017 | 2.006 | 0.468 (4.3x) | 0.484 (4.2x) | 0.478 (4.2x) |
pdist, cosine, f64 | 1.991 | 2.004 | 1.996 | 0.461 (4.3x) | 0.476 (4.2x) | 0.471 (4.2x) |
euclidean, f32 | 0.644 | 0.670 | 0.654 | 0.076 (8.5x) | 0.080 (8.4x) | 0.078 (8.3x) |
euclidean, f64 | 0.672 | 0.701 | 0.682 | 0.097 (6.9x) | 0.102 (6.9x) | 0.100 (6.8x) |
sqeuclidean, f32 | 0.506 | 0.512 | 0.508 | 0.076 (6.6x) | 0.079 (6.5x) | 0.078 (6.5x) |
sqeuclidean, f64 | 0.515 | 0.519 | 0.518 | 0.100 (5.1x) | 0.104 (5.0x) | 0.103 (5.0x) |
cosine, f32 | 0.668 | 0.687 | 0.677 | 0.110 (6.1x) | 0.113 (6.1x) | 0.111 (6.1x) |
cosine, f64 | 0.465 | 0.472 | 0.469 | 0.127 (3.7x) | 0.130 (3.6x) | 0.129 (3.6x) |
f32 | f64 |
---|---|
|
|
Vector Distance Benchmarks (No SIMD)
These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.
The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance
.
The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy
functions.
All pairwise distances (cdist
and pdist
) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs.
All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.
These benchmarks were run using the richbench
package.
Benchmark | Min | Max | Mean | Min (+) | Max (+) | Mean (+) |
---|---|---|---|---|---|---|
braycurtis, f32 | 1.103 | 1.134 | 1.114 | 0.323 (3.4x) | 0.324 (3.5x) | 0.323 (3.4x) |
braycurtis, f64 | 0.834 | 0.843 | 0.838 | 0.170 (4.9x) | 0.173 (4.9x) | 0.171 (4.9x) |
canberra, f32 | 2.524 | 2.529 | 2.526 | 0.153 (16.5x) | 0.155 (16.3x) | 0.154 (16.4x) |
canberra, f64 | 2.216 | 2.260 | 2.235 | 0.168 (13.2x) | 0.170 (13.3x) | 0.169 (13.2x) |
chebyshev, f32 | 2.738 | 2.774 | 2.753 | 0.149 (18.3x) | 0.151 (18.4x) | 0.150 (18.4x) |
chebyshev, f64 | 2.777 | 2.784 | 2.781 | 0.165 (16.8x) | 0.166 (16.8x) | 0.165 (16.8x) |
euclidean, f32 | 0.641 | 0.641 | 0.641 | 0.150 (4.3x) | 0.150 (4.3x) | 0.150 (4.3x) |
euclidean, f64 | 0.657 | 0.662 | 0.660 | 0.167 (3.9x) | 0.168 (3.9x) | 0.168 (3.9x) |
sqeuclidean, f32 | 0.506 | 0.509 | 0.507 | 0.149 (3.4x) | 0.149 (3.4x) | 0.149 (3.4x) |
sqeuclidean, f64 | 0.514 | 0.518 | 0.516 | 0.165 (3.1x) | 0.173 (3.0x) | 0.170 (3.0x) |
cityblock, f32 | 0.437 | 0.443 | 0.440 | 0.150 (2.9x) | 0.150 (2.9x) | 0.150 (2.9x) |
cityblock, f64 | 0.444 | 0.451 | 0.447 | 0.167 (2.7x) | 0.168 (2.7x) | 0.168 (2.7x) |
cosine, f32 | 0.659 | 0.668 | 0.664 | 0.314 (2.1x) | 0.315 (2.1x) | 0.314 (2.1x) |
cosine, f64 | 0.459 | 0.471 | 0.465 | 0.321 (1.4x) | 0.325 (1.4x) | 0.324 (1.4x) |
cdist, braycurtis, f32 | 4.902 | 4.906 | 4.904 | 1.802 (2.7x) | 1.875 (2.6x) | 1.833 (2.7x) |
cdist, braycurtis, f64 | 4.765 | 4.775 | 4.768 | 0.710 (6.7x) | 0.735 (6.5x) | 0.725 (6.6x) |
cdist, canberra, f32 | 6.914 | 6.943 | 6.930 | 1.356 (5.1x) | 1.385 (5.0x) | 1.367 (5.1x) |
cdist, canberra, f64 | 6.782 | 6.813 | 6.797 | 0.684 (9.9x) | 0.701 (9.7x) | 0.692 (9.8x) |
cdist, chebyshev, f32 | 2.763 | 2.768 | 2.765 | 0.640 (4.3x) | 0.663 (4.2x) | 0.649 (4.3x) |
cdist, chebyshev, f64 | 2.659 | 2.677 | 2.664 | 0.647 (4.1x) | 0.662 (4.0x) | 0.655 (4.1x) |
cdist, euclidean, f32 | 2.563 | 2.570 | 2.564 | 0.644 (4.0x) | 0.658 (3.9x) | 0.653 (3.9x) |
cdist, euclidean, f64 | 2.378 | 2.400 | 2.388 | 0.630 (3.8x) | 0.649 (3.7x) | 0.640 (3.7x) |
cdist, sqeuclidean, f32 | 2.516 | 2.523 | 2.519 | 0.648 (3.9x) | 0.660 (3.8x) | 0.652 (3.9x) |
cdist, sqeuclidean, f64 | 2.412 | 2.423 | 2.417 | 0.631 (3.8x) | 0.645 (3.8x) | 0.638 (3.8x) |
cdist, cityblock, f32 | 4.545 | 4.552 | 4.548 | 0.647 (7.0x) | 0.671 (6.8x) | 0.658 (6.9x) |
cdist, cityblock, f64 | 4.406 | 4.407 | 4.407 | 0.633 (7.0x) | 0.657 (6.7x) | 0.647 (6.8x) |
cdist, cosine, f32 | 4.010 | 4.020 | 4.013 | 2.254 (1.8x) | 2.292 (1.8x) | 2.270 (1.8x) |
cdist, cosine, f64 | 3.987 | 3.992 | 3.990 | 2.241 (1.8x) | 2.288 (1.7x) | 2.258 (1.8x) |
pdist, braycurtis, f32 | 2.382 | 2.387 | 2.385 | 1.062 (2.2x) | 1.074 (2.2x) | 1.069 (2.2x) |
pdist, braycurtis, f64 | 2.368 | 2.378 | 2.371 | 0.510 (4.6x) | 0.523 (4.5x) | 0.516 (4.6x) |
pdist, canberra, f32 | 3.374 | 3.411 | 3.389 | 0.831 (4.1x) | 0.841 (4.1x) | 0.835 (4.1x) |
pdist, canberra, f64 | 3.369 | 3.411 | 3.396 | 0.504 (6.7x) | 0.515 (6.6x) | 0.509 (6.7x) |
pdist, chebyshev, f32 | 1.362 | 1.364 | 1.363 | 0.478 (2.9x) | 0.488 (2.8x) | 0.484 (2.8x) |
pdist, chebyshev, f64 | 1.338 | 1.343 | 1.341 | 0.476 (2.8x) | 0.485 (2.8x) | 0.481 (2.8x) |
pdist, euclidean, f32 | 1.241 | 1.250 | 1.246 | 0.482 (2.6x) | 0.487 (2.6x) | 0.484 (2.6x) |
pdist, euclidean, f64 | 1.222 | 1.228 | 1.225 | 0.474 (2.6x) | 0.503 (2.4x) | 0.482 (2.5x) |
pdist, sqeuclidean, f32 | 1.224 | 1.247 | 1.233 | 0.477 (2.6x) | 0.490 (2.5x) | 0.481 (2.6x) |
pdist, sqeuclidean, f64 | 1.211 | 1.214 | 1.213 | 0.470 (2.6x) | 0.478 (2.5x) | 0.475 (2.6x) |
pdist, cityblock, f32 | 2.204 | 2.207 | 2.205 | 0.483 (4.6x) | 0.491 (4.5x) | 0.486 (4.5x) |
pdist, cityblock, f64 | 2.189 | 2.198 | 2.192 | 0.476 (4.6x) | 0.483 (4.5x) | 0.481 (4.6x) |
pdist, cosine, f32 | 2.000 | 2.004 | 2.002 | 1.292 (1.5x) | 1.302 (1.5x) | 1.296 (1.5x) |
pdist, cosine, f64 | 1.988 | 1.992 | 1.990 | 1.288 (1.5x) | 1.296 (1.5x) | 1.292 (1.5x) |
F32 | F64 |
---|---|
|
|
u32 | u64 |
---|---|
String Distance Benchmarks
These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.
All string distances were computed 100 times each, among different pairs of strings, and the average time was taken.
|
License
This package is licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file abd_distances-1.0.3.tar.gz
.
File metadata
- Download URL: abd_distances-1.0.3.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed5169152605bf785a5780be36310589d3899ff032d7f7b8d7d834dd4357896b |
|
MD5 | d70b69a026e890237873ff55cd8eefc5 |
|
BLAKE2b-256 | aac7e394242ec3deb38d8e1452e2bcfad5376abe8ecae17c3c119324ad0a5db6 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-win_amd64.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 392.2 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc2d6ef07f5fc779c2b508f76497297554d24ec222dde9dcb9775ab590110310 |
|
MD5 | 269e40ebd9396f8ceec6a2ed9d9beced |
|
BLAKE2b-256 | 1d2978f34755527be461047a92906fd9ab8b4f1b3dbe04b51f605cb08f9bca53 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-win32.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-win32.whl
- Upload date:
- Size: 344.2 kB
- Tags: CPython 3.9+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e6d54c4e146db8af381b297ad25b7576522ec39038d4a76e8be4c9ba57ad5ae |
|
MD5 | 693563005cb7982ea53bed50ae4e09a1 |
|
BLAKE2b-256 | fc19c7bebc7fd6fb0ddf9e8a1bc567ededbe22b6dfaf9181bee40b10984ac396 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 502.4 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 844cc8ab7e851908adcfb58c71ed6559cf71d44b1bf0c8ec702152f4b11a8c85 |
|
MD5 | b0b480fc59d69b75ae9c109f3e7d0856 |
|
BLAKE2b-256 | 6a2a293b36e649de61296a48979eb26df02d15ef070a7def4f7bb24215244cd2 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ s390x
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07d6cc9c679e1ff876c5223a8671d2dc5a32b9584a5fef2d663f0ca51d93f75e |
|
MD5 | 0e39d31874257f9684995bccadeda70d |
|
BLAKE2b-256 | 7cbf2a62236c9adcb2f3ff50a44306f0fb1230c5f26603051b18146dbc020662 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
- Upload date:
- Size: 523.6 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ppc64le
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 641da031a4474bc67db00f36587576038378ff6b12fdab40f54c721e304ebe16 |
|
MD5 | 5c2cb9ad0ea09e9b43f0b067b1eb3a68 |
|
BLAKE2b-256 | d0cc5a23aef2cff52b40be5bd5bab976d9f1b56c578d514686564703211a4aad |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 440.1 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ae3a71cfe9813d6cdb4e3cb0ff6d860a1489b33ee5cede7eec8bb31d542b156 |
|
MD5 | 27e7c7bd356dc055b48bc72a85080019 |
|
BLAKE2b-256 | cceeaa329abc36eca19cdbcf6274c5ffb97e8cf12f484c3de8e680f6abd5b367 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 451.5 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98c6a76c48610b8be8e250e387f2a82ffee8812eb76a25ebdd2211caa168816d |
|
MD5 | 92bcae20650062bbbea9b98c3d309bc8 |
|
BLAKE2b-256 | bc659e6645fdcc37fba19f26a060464f0d74ab892035125ff1615ab1758e8451 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
- Upload date:
- Size: 490.6 kB
- Tags: CPython 3.9+, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d300b48a1d7e77bd176c77c5e5e1bdbfb647da4d6afc135a9099951f99863f29 |
|
MD5 | e180952bf0677a9f228fe87e0b0f1691 |
|
BLAKE2b-256 | a95a278600e5e0b36bb84cd77bcc1143404f9e93cecda9b8e02acdaed4233f58 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 438.5 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e06d35919a29264f771f5dfe714577f6732391a9571de76f1c56bf8cb6fe131a |
|
MD5 | c7798bd768a5cf6b671365a10dfde10c |
|
BLAKE2b-256 | e439d23b890eddd1897aa629dc5ee6a2be2fa384b5ac74365989a61470df1861 |
File details
Details for the file abd_distances-1.0.3-cp39-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: abd_distances-1.0.3-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 475.7 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.5.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5d9609f08f28f0ab2f1adf3a6af74180c124f56b55e334ae76429ed07ccafb0 |
|
MD5 | 3af5e0462d4453df3a761477fe14d551 |
|
BLAKE2b-256 | 613d06123835c9d7ed9a8f85705c7054b29572c6242293da9663092c649df77c |