Skip to main content

Distance functions: A drop-in replacement for, and a super-set of the scipy.spatial.distance module.

Project description

Algorithms for Big Data: Distances (v1.0.3)

This package contains algorithms for computing distances between data points. It is a thin Python wrapper around the distances crate, in Rust. It provides drop-in replacements for the distance functions in scipy.spatial.distance.

Supported Distance Functions

Installation

pip install abd-distances

Usage

import math

import numpy
import abd_distances.simd as distance

a = numpy.array([i for i in range(10_000)], dtype=numpy.float32)
b = a + 1.0

dist = distance.euclidean(a, b)

assert math.fabs(dist - 100.0) < 1e-6

print(dist)
# 100.0

Vector Distances

  • Bray-Curtis: abd_distances.vector.braycurtis
  • Canberra: abd_distances.vector.canberra
  • Chebyshev: abd_distances.vector.chebyshev
  • Correlation
  • Cosine: abd_distances.vector.cosine
  • Euclidean: abd_distances.vector.euclidean
  • Jensen-Shannon
  • Mahalanobis
  • Manhattan: abd_distances.vector.manhattan and abd_distances.vector.cityblock
  • Minkowski: abd_distances.vector.minkowski
  • Standardized Euclidean
  • Squared Euclidean: abd_distances.vector.sqeuclidean
  • Pairwise Distances: abd_distances.vector.cdist and abd_distances.vector.pdist
  • ...

Boolean Distances

  • Dice
  • Hamming
  • Jaccard
  • Kulczynski 1D
  • Rogers-Tanimoto
  • Russell-Rao
  • Sokal-Michener
  • Sokal-Sneath
  • Yule
  • ...

SIMD-Accelerated Vector Distances

  • Euclidean: abd_distances.simd.euclidean
  • Squared Euclidean: abd_distances.simd.sqeuclidean
  • Cosine: abd_distances.simd.cosine
  • Pairwise Distances: abd_distances.simd.cdist and abd_distances.simd.pdist
  • ...

String Distances

  • Hamming: abd_distances.strings.hamming
  • Levenshtein: abd_distances.strings.levenshtein
  • Needleman-Wunsch: abd_distances.strings.needleman_wunsch
  • Smith-Waterman
  • Pairwise Distances
  • ...

Benchmarks

SIMD-Accelerated Vector Distance Benchmarks

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance. The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy functions. All pairwise distances (cdist and pdist) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs. All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
cdist, euclidean, f32 2.560 2.576 2.566 0.185 (13.9x) 0.196 (13.2x) 0.188 (13.7x)
cdist, euclidean, f64 2.398 2.406 2.401 0.292 (8.2x) 0.307 (7.8x) 0.298 (8.0x)
cdist, sqeuclidean, f32 2.519 2.527 2.523 0.182 (13.9x) 0.197 (12.8x) 0.187 (13.5x)
cdist, sqeuclidean, f64 2.381 2.393 2.389 0.293 (8.1x) 0.318 (7.5x) 0.301 (7.9x)
cdist, cosine, f32 4.011 4.021 4.016 0.625 (6.4x) 0.637 (6.3x) 0.632 (6.4x)
cdist, cosine, f64 3.978 4.009 3.992 0.626 (6.4x) 0.666 (6.0x) 0.638 (6.3x)
pdist, euclidean, f32 1.235 1.249 1.241 0.252 (4.9x) 0.263 (4.7x) 0.257 (4.8x)
pdist, euclidean, f64 1.216 1.262 1.234 0.302 (4.0x) 0.312 (4.0x) 0.308 (4.0x)
pdist, sqeuclidean, f32 1.229 1.250 1.237 0.251 (4.9x) 0.303 (4.1x) 0.265 (4.7x)
pdist, sqeuclidean, f64 1.209 1.213 1.211 0.306 (3.9x) 0.313 (3.9x) 0.310 (3.9x)
pdist, cosine, f32 2.001 2.017 2.006 0.468 (4.3x) 0.484 (4.2x) 0.478 (4.2x)
pdist, cosine, f64 1.991 2.004 1.996 0.461 (4.3x) 0.476 (4.2x) 0.471 (4.2x)
euclidean, f32 0.644 0.670 0.654 0.076 (8.5x) 0.080 (8.4x) 0.078 (8.3x)
euclidean, f64 0.672 0.701 0.682 0.097 (6.9x) 0.102 (6.9x) 0.100 (6.8x)
sqeuclidean, f32 0.506 0.512 0.508 0.076 (6.6x) 0.079 (6.5x) 0.078 (6.5x)
sqeuclidean, f64 0.515 0.519 0.518 0.100 (5.1x) 0.104 (5.0x) 0.103 (5.0x)
cosine, f32 0.668 0.687 0.677 0.110 (6.1x) 0.113 (6.1x) 0.111 (6.1x)
cosine, f64 0.465 0.472 0.469 0.127 (3.7x) 0.130 (3.6x) 0.129 (3.6x)
f32 f64

Euclidean f32 Squared Euclidean f32 Cosine f32

Euclidean f64 Squared Euclidean f64 Cosine f64

Vector Distance Benchmarks (No SIMD)

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance. The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy functions. All pairwise distances (cdist and pdist) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs. All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.

These benchmarks were run using the richbench package.

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
braycurtis, f32 1.103 1.134 1.114 0.323 (3.4x) 0.324 (3.5x) 0.323 (3.4x)
braycurtis, f64 0.834 0.843 0.838 0.170 (4.9x) 0.173 (4.9x) 0.171 (4.9x)
canberra, f32 2.524 2.529 2.526 0.153 (16.5x) 0.155 (16.3x) 0.154 (16.4x)
canberra, f64 2.216 2.260 2.235 0.168 (13.2x) 0.170 (13.3x) 0.169 (13.2x)
chebyshev, f32 2.738 2.774 2.753 0.149 (18.3x) 0.151 (18.4x) 0.150 (18.4x)
chebyshev, f64 2.777 2.784 2.781 0.165 (16.8x) 0.166 (16.8x) 0.165 (16.8x)
euclidean, f32 0.641 0.641 0.641 0.150 (4.3x) 0.150 (4.3x) 0.150 (4.3x)
euclidean, f64 0.657 0.662 0.660 0.167 (3.9x) 0.168 (3.9x) 0.168 (3.9x)
sqeuclidean, f32 0.506 0.509 0.507 0.149 (3.4x) 0.149 (3.4x) 0.149 (3.4x)
sqeuclidean, f64 0.514 0.518 0.516 0.165 (3.1x) 0.173 (3.0x) 0.170 (3.0x)
cityblock, f32 0.437 0.443 0.440 0.150 (2.9x) 0.150 (2.9x) 0.150 (2.9x)
cityblock, f64 0.444 0.451 0.447 0.167 (2.7x) 0.168 (2.7x) 0.168 (2.7x)
cosine, f32 0.659 0.668 0.664 0.314 (2.1x) 0.315 (2.1x) 0.314 (2.1x)
cosine, f64 0.459 0.471 0.465 0.321 (1.4x) 0.325 (1.4x) 0.324 (1.4x)
cdist, braycurtis, f32 4.902 4.906 4.904 1.802 (2.7x) 1.875 (2.6x) 1.833 (2.7x)
cdist, braycurtis, f64 4.765 4.775 4.768 0.710 (6.7x) 0.735 (6.5x) 0.725 (6.6x)
cdist, canberra, f32 6.914 6.943 6.930 1.356 (5.1x) 1.385 (5.0x) 1.367 (5.1x)
cdist, canberra, f64 6.782 6.813 6.797 0.684 (9.9x) 0.701 (9.7x) 0.692 (9.8x)
cdist, chebyshev, f32 2.763 2.768 2.765 0.640 (4.3x) 0.663 (4.2x) 0.649 (4.3x)
cdist, chebyshev, f64 2.659 2.677 2.664 0.647 (4.1x) 0.662 (4.0x) 0.655 (4.1x)
cdist, euclidean, f32 2.563 2.570 2.564 0.644 (4.0x) 0.658 (3.9x) 0.653 (3.9x)
cdist, euclidean, f64 2.378 2.400 2.388 0.630 (3.8x) 0.649 (3.7x) 0.640 (3.7x)
cdist, sqeuclidean, f32 2.516 2.523 2.519 0.648 (3.9x) 0.660 (3.8x) 0.652 (3.9x)
cdist, sqeuclidean, f64 2.412 2.423 2.417 0.631 (3.8x) 0.645 (3.8x) 0.638 (3.8x)
cdist, cityblock, f32 4.545 4.552 4.548 0.647 (7.0x) 0.671 (6.8x) 0.658 (6.9x)
cdist, cityblock, f64 4.406 4.407 4.407 0.633 (7.0x) 0.657 (6.7x) 0.647 (6.8x)
cdist, cosine, f32 4.010 4.020 4.013 2.254 (1.8x) 2.292 (1.8x) 2.270 (1.8x)
cdist, cosine, f64 3.987 3.992 3.990 2.241 (1.8x) 2.288 (1.7x) 2.258 (1.8x)
pdist, braycurtis, f32 2.382 2.387 2.385 1.062 (2.2x) 1.074 (2.2x) 1.069 (2.2x)
pdist, braycurtis, f64 2.368 2.378 2.371 0.510 (4.6x) 0.523 (4.5x) 0.516 (4.6x)
pdist, canberra, f32 3.374 3.411 3.389 0.831 (4.1x) 0.841 (4.1x) 0.835 (4.1x)
pdist, canberra, f64 3.369 3.411 3.396 0.504 (6.7x) 0.515 (6.6x) 0.509 (6.7x)
pdist, chebyshev, f32 1.362 1.364 1.363 0.478 (2.9x) 0.488 (2.8x) 0.484 (2.8x)
pdist, chebyshev, f64 1.338 1.343 1.341 0.476 (2.8x) 0.485 (2.8x) 0.481 (2.8x)
pdist, euclidean, f32 1.241 1.250 1.246 0.482 (2.6x) 0.487 (2.6x) 0.484 (2.6x)
pdist, euclidean, f64 1.222 1.228 1.225 0.474 (2.6x) 0.503 (2.4x) 0.482 (2.5x)
pdist, sqeuclidean, f32 1.224 1.247 1.233 0.477 (2.6x) 0.490 (2.5x) 0.481 (2.6x)
pdist, sqeuclidean, f64 1.211 1.214 1.213 0.470 (2.6x) 0.478 (2.5x) 0.475 (2.6x)
pdist, cityblock, f32 2.204 2.207 2.205 0.483 (4.6x) 0.491 (4.5x) 0.486 (4.5x)
pdist, cityblock, f64 2.189 2.198 2.192 0.476 (4.6x) 0.483 (4.5x) 0.481 (4.6x)
pdist, cosine, f32 2.000 2.004 2.002 1.292 (1.5x) 1.302 (1.5x) 1.296 (1.5x)
pdist, cosine, f64 1.988 1.992 1.990 1.288 (1.5x) 1.296 (1.5x) 1.292 (1.5x)
F32 F64

Chebyshev f32 Euclidean f32 Squared Euclidean f32 Manhattan f32 Canberra f32 Cosine f32

Chebyshev f64 Euclidean f64 Squared Euclidean f64 Manhattan f64 Canberra f64 Cosine f64

u32 u64

Bray-Curtis u32

Bray-Curtis u64

String Distance Benchmarks

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

All string distances were computed 100 times each, among different pairs of strings, and the average time was taken.

Hamming Levenshtein Needleman-Wunsch

License

This package is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abd_distances-1.0.3.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

abd_distances-1.0.3-cp39-abi3-win_amd64.whl (392.2 kB view details)

Uploaded CPython 3.9+ Windows x86-64

abd_distances-1.0.3-cp39-abi3-win32.whl (344.2 kB view details)

Uploaded CPython 3.9+ Windows x86

abd_distances-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (502.4 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ x86-64

abd_distances-1.0.3-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ s390x

abd_distances-1.0.3-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (523.6 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ ppc64le

abd_distances-1.0.3-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (440.1 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ ARMv7l

abd_distances-1.0.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (451.5 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ ARM64

abd_distances-1.0.3-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (490.6 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.12+ i686

abd_distances-1.0.3-cp39-abi3-macosx_11_0_arm64.whl (438.5 kB view details)

Uploaded CPython 3.9+ macOS 11.0+ ARM64

abd_distances-1.0.3-cp39-abi3-macosx_10_12_x86_64.whl (475.7 kB view details)

Uploaded CPython 3.9+ macOS 10.12+ x86-64

File details

Details for the file abd_distances-1.0.3.tar.gz.

File metadata

  • Download URL: abd_distances-1.0.3.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for abd_distances-1.0.3.tar.gz
Algorithm Hash digest
SHA256 ed5169152605bf785a5780be36310589d3899ff032d7f7b8d7d834dd4357896b
MD5 d70b69a026e890237873ff55cd8eefc5
BLAKE2b-256 aac7e394242ec3deb38d8e1452e2bcfad5376abe8ecae17c3c119324ad0a5db6

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 dc2d6ef07f5fc779c2b508f76497297554d24ec222dde9dcb9775ab590110310
MD5 269e40ebd9396f8ceec6a2ed9d9beced
BLAKE2b-256 1d2978f34755527be461047a92906fd9ab8b4f1b3dbe04b51f605cb08f9bca53

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-win32.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-win32.whl
Algorithm Hash digest
SHA256 6e6d54c4e146db8af381b297ad25b7576522ec39038d4a76e8be4c9ba57ad5ae
MD5 693563005cb7982ea53bed50ae4e09a1
BLAKE2b-256 fc19c7bebc7fd6fb0ddf9e8a1bc567ededbe22b6dfaf9181bee40b10984ac396

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 844cc8ab7e851908adcfb58c71ed6559cf71d44b1bf0c8ec702152f4b11a8c85
MD5 b0b480fc59d69b75ae9c109f3e7d0856
BLAKE2b-256 6a2a293b36e649de61296a48979eb26df02d15ef070a7def4f7bb24215244cd2

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 07d6cc9c679e1ff876c5223a8671d2dc5a32b9584a5fef2d663f0ca51d93f75e
MD5 0e39d31874257f9684995bccadeda70d
BLAKE2b-256 7cbf2a62236c9adcb2f3ff50a44306f0fb1230c5f26603051b18146dbc020662

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 641da031a4474bc67db00f36587576038378ff6b12fdab40f54c721e304ebe16
MD5 5c2cb9ad0ea09e9b43f0b067b1eb3a68
BLAKE2b-256 d0cc5a23aef2cff52b40be5bd5bab976d9f1b56c578d514686564703211a4aad

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 8ae3a71cfe9813d6cdb4e3cb0ff6d860a1489b33ee5cede7eec8bb31d542b156
MD5 27e7c7bd356dc055b48bc72a85080019
BLAKE2b-256 cceeaa329abc36eca19cdbcf6274c5ffb97e8cf12f484c3de8e680f6abd5b367

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 98c6a76c48610b8be8e250e387f2a82ffee8812eb76a25ebdd2211caa168816d
MD5 92bcae20650062bbbea9b98c3d309bc8
BLAKE2b-256 bc659e6645fdcc37fba19f26a060464f0d74ab892035125ff1615ab1758e8451

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 d300b48a1d7e77bd176c77c5e5e1bdbfb647da4d6afc135a9099951f99863f29
MD5 e180952bf0677a9f228fe87e0b0f1691
BLAKE2b-256 a95a278600e5e0b36bb84cd77bcc1143404f9e93cecda9b8e02acdaed4233f58

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e06d35919a29264f771f5dfe714577f6732391a9571de76f1c56bf8cb6fe131a
MD5 c7798bd768a5cf6b671365a10dfde10c
BLAKE2b-256 e439d23b890eddd1897aa629dc5ee6a2be2fa384b5ac74365989a61470df1861

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.3-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.3-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a5d9609f08f28f0ab2f1adf3a6af74180c124f56b55e334ae76429ed07ccafb0
MD5 3af5e0462d4453df3a761477fe14d551
BLAKE2b-256 613d06123835c9d7ed9a8f85705c7054b29572c6242293da9663092c649df77c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page