Skip to main content

Distance functions: A drop-in replacement for, and a super-set of the scipy.spatial.distance module.

Project description

Algorithms for Big Data: Distances (v1.0.4)

This package contains algorithms for computing distances between data points. It is a thin Python wrapper around the distances crate, in Rust. It provides drop-in replacements for the distance functions in scipy.spatial.distance.

Supported Distance Functions

Installation

pip install abd-distances

Usage

import math

import numpy
import abd_distances.simd as distance

a = numpy.array([i for i in range(10_000)], dtype=numpy.float32)
b = a + 1.0

dist = distance.euclidean(a, b)

assert math.fabs(dist - 100.0) < 1e-6

print(dist)
# 100.0

Vector Distances

  • Bray-Curtis: abd_distances.vector.braycurtis
  • Canberra: abd_distances.vector.canberra
  • Chebyshev: abd_distances.vector.chebyshev
  • Correlation
  • Cosine: abd_distances.vector.cosine
  • Euclidean: abd_distances.vector.euclidean
  • Jensen-Shannon
  • Mahalanobis
  • Manhattan: abd_distances.vector.manhattan and abd_distances.vector.cityblock
  • Minkowski: abd_distances.vector.minkowski
  • Standardized Euclidean
  • Squared Euclidean: abd_distances.vector.sqeuclidean
  • Pairwise Distances: abd_distances.vector.cdist and abd_distances.vector.pdist
  • ...

Boolean Distances

  • Dice
  • Hamming
  • Jaccard
  • Kulczynski 1D
  • Rogers-Tanimoto
  • Russell-Rao
  • Sokal-Michener
  • Sokal-Sneath
  • Yule
  • ...

SIMD-Accelerated Vector Distances

  • Euclidean: abd_distances.simd.euclidean
  • Squared Euclidean: abd_distances.simd.sqeuclidean
  • Cosine: abd_distances.simd.cosine
  • Pairwise Distances: abd_distances.simd.cdist and abd_distances.simd.pdist
  • ...

String Distances

  • Hamming: abd_distances.strings.hamming
  • Levenshtein: abd_distances.strings.levenshtein
  • Needleman-Wunsch: abd_distances.strings.needleman_wunsch
  • Smith-Waterman
  • Pairwise Distances
  • ...

Benchmarks

SIMD-Accelerated Vector Distance Benchmarks

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance. The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy functions. All pairwise distances (cdist and pdist) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs. All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
cdist, euclidean, f32 2.560 2.576 2.566 0.185 (13.9x) 0.196 (13.2x) 0.188 (13.7x)
cdist, euclidean, f64 2.398 2.406 2.401 0.292 (8.2x) 0.307 (7.8x) 0.298 (8.0x)
cdist, sqeuclidean, f32 2.519 2.527 2.523 0.182 (13.9x) 0.197 (12.8x) 0.187 (13.5x)
cdist, sqeuclidean, f64 2.381 2.393 2.389 0.293 (8.1x) 0.318 (7.5x) 0.301 (7.9x)
cdist, cosine, f32 4.011 4.021 4.016 0.625 (6.4x) 0.637 (6.3x) 0.632 (6.4x)
cdist, cosine, f64 3.978 4.009 3.992 0.626 (6.4x) 0.666 (6.0x) 0.638 (6.3x)
pdist, euclidean, f32 1.235 1.249 1.241 0.252 (4.9x) 0.263 (4.7x) 0.257 (4.8x)
pdist, euclidean, f64 1.216 1.262 1.234 0.302 (4.0x) 0.312 (4.0x) 0.308 (4.0x)
pdist, sqeuclidean, f32 1.229 1.250 1.237 0.251 (4.9x) 0.303 (4.1x) 0.265 (4.7x)
pdist, sqeuclidean, f64 1.209 1.213 1.211 0.306 (3.9x) 0.313 (3.9x) 0.310 (3.9x)
pdist, cosine, f32 2.001 2.017 2.006 0.468 (4.3x) 0.484 (4.2x) 0.478 (4.2x)
pdist, cosine, f64 1.991 2.004 1.996 0.461 (4.3x) 0.476 (4.2x) 0.471 (4.2x)
euclidean, f32 0.644 0.670 0.654 0.076 (8.5x) 0.080 (8.4x) 0.078 (8.3x)
euclidean, f64 0.672 0.701 0.682 0.097 (6.9x) 0.102 (6.9x) 0.100 (6.8x)
sqeuclidean, f32 0.506 0.512 0.508 0.076 (6.6x) 0.079 (6.5x) 0.078 (6.5x)
sqeuclidean, f64 0.515 0.519 0.518 0.100 (5.1x) 0.104 (5.0x) 0.103 (5.0x)
cosine, f32 0.668 0.687 0.677 0.110 (6.1x) 0.113 (6.1x) 0.111 (6.1x)
cosine, f64 0.465 0.472 0.469 0.127 (3.7x) 0.130 (3.6x) 0.129 (3.6x)
f32 f64

Euclidean f32 Squared Euclidean f32 Cosine f32

Euclidean f64 Squared Euclidean f64 Cosine f64

Vector Distance Benchmarks (No SIMD)

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

The "Min", "Max", and "Mean" columns show the minimum, maximum, and mean times (in seconds), respectively, taken to compute the pairwise distances using the functions from scipy.spatial.distance. The "Min (+)", "Max (+)", and "Mean (+)" columns show the speedup of the this package's functions over the scipy functions. All pairwise distances (cdist and pdist) were computed for 200x200 vectors of 500 dimensions, and the average time was taken over 100 runs. All individual distances were computed for 20x20 vectors of 500 dimensions, and the average time was taken over 100 runs.

These benchmarks were run using the richbench package.

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
braycurtis, f32 1.103 1.134 1.114 0.323 (3.4x) 0.324 (3.5x) 0.323 (3.4x)
braycurtis, f64 0.834 0.843 0.838 0.170 (4.9x) 0.173 (4.9x) 0.171 (4.9x)
canberra, f32 2.524 2.529 2.526 0.153 (16.5x) 0.155 (16.3x) 0.154 (16.4x)
canberra, f64 2.216 2.260 2.235 0.168 (13.2x) 0.170 (13.3x) 0.169 (13.2x)
chebyshev, f32 2.738 2.774 2.753 0.149 (18.3x) 0.151 (18.4x) 0.150 (18.4x)
chebyshev, f64 2.777 2.784 2.781 0.165 (16.8x) 0.166 (16.8x) 0.165 (16.8x)
euclidean, f32 0.641 0.641 0.641 0.150 (4.3x) 0.150 (4.3x) 0.150 (4.3x)
euclidean, f64 0.657 0.662 0.660 0.167 (3.9x) 0.168 (3.9x) 0.168 (3.9x)
sqeuclidean, f32 0.506 0.509 0.507 0.149 (3.4x) 0.149 (3.4x) 0.149 (3.4x)
sqeuclidean, f64 0.514 0.518 0.516 0.165 (3.1x) 0.173 (3.0x) 0.170 (3.0x)
cityblock, f32 0.437 0.443 0.440 0.150 (2.9x) 0.150 (2.9x) 0.150 (2.9x)
cityblock, f64 0.444 0.451 0.447 0.167 (2.7x) 0.168 (2.7x) 0.168 (2.7x)
cosine, f32 0.659 0.668 0.664 0.314 (2.1x) 0.315 (2.1x) 0.314 (2.1x)
cosine, f64 0.459 0.471 0.465 0.321 (1.4x) 0.325 (1.4x) 0.324 (1.4x)
cdist, braycurtis, f32 4.902 4.906 4.904 1.802 (2.7x) 1.875 (2.6x) 1.833 (2.7x)
cdist, braycurtis, f64 4.765 4.775 4.768 0.710 (6.7x) 0.735 (6.5x) 0.725 (6.6x)
cdist, canberra, f32 6.914 6.943 6.930 1.356 (5.1x) 1.385 (5.0x) 1.367 (5.1x)
cdist, canberra, f64 6.782 6.813 6.797 0.684 (9.9x) 0.701 (9.7x) 0.692 (9.8x)
cdist, chebyshev, f32 2.763 2.768 2.765 0.640 (4.3x) 0.663 (4.2x) 0.649 (4.3x)
cdist, chebyshev, f64 2.659 2.677 2.664 0.647 (4.1x) 0.662 (4.0x) 0.655 (4.1x)
cdist, euclidean, f32 2.563 2.570 2.564 0.644 (4.0x) 0.658 (3.9x) 0.653 (3.9x)
cdist, euclidean, f64 2.378 2.400 2.388 0.630 (3.8x) 0.649 (3.7x) 0.640 (3.7x)
cdist, sqeuclidean, f32 2.516 2.523 2.519 0.648 (3.9x) 0.660 (3.8x) 0.652 (3.9x)
cdist, sqeuclidean, f64 2.412 2.423 2.417 0.631 (3.8x) 0.645 (3.8x) 0.638 (3.8x)
cdist, cityblock, f32 4.545 4.552 4.548 0.647 (7.0x) 0.671 (6.8x) 0.658 (6.9x)
cdist, cityblock, f64 4.406 4.407 4.407 0.633 (7.0x) 0.657 (6.7x) 0.647 (6.8x)
cdist, cosine, f32 4.010 4.020 4.013 2.254 (1.8x) 2.292 (1.8x) 2.270 (1.8x)
cdist, cosine, f64 3.987 3.992 3.990 2.241 (1.8x) 2.288 (1.7x) 2.258 (1.8x)
pdist, braycurtis, f32 2.382 2.387 2.385 1.062 (2.2x) 1.074 (2.2x) 1.069 (2.2x)
pdist, braycurtis, f64 2.368 2.378 2.371 0.510 (4.6x) 0.523 (4.5x) 0.516 (4.6x)
pdist, canberra, f32 3.374 3.411 3.389 0.831 (4.1x) 0.841 (4.1x) 0.835 (4.1x)
pdist, canberra, f64 3.369 3.411 3.396 0.504 (6.7x) 0.515 (6.6x) 0.509 (6.7x)
pdist, chebyshev, f32 1.362 1.364 1.363 0.478 (2.9x) 0.488 (2.8x) 0.484 (2.8x)
pdist, chebyshev, f64 1.338 1.343 1.341 0.476 (2.8x) 0.485 (2.8x) 0.481 (2.8x)
pdist, euclidean, f32 1.241 1.250 1.246 0.482 (2.6x) 0.487 (2.6x) 0.484 (2.6x)
pdist, euclidean, f64 1.222 1.228 1.225 0.474 (2.6x) 0.503 (2.4x) 0.482 (2.5x)
pdist, sqeuclidean, f32 1.224 1.247 1.233 0.477 (2.6x) 0.490 (2.5x) 0.481 (2.6x)
pdist, sqeuclidean, f64 1.211 1.214 1.213 0.470 (2.6x) 0.478 (2.5x) 0.475 (2.6x)
pdist, cityblock, f32 2.204 2.207 2.205 0.483 (4.6x) 0.491 (4.5x) 0.486 (4.5x)
pdist, cityblock, f64 2.189 2.198 2.192 0.476 (4.6x) 0.483 (4.5x) 0.481 (4.6x)
pdist, cosine, f32 2.000 2.004 2.002 1.292 (1.5x) 1.302 (1.5x) 1.296 (1.5x)
pdist, cosine, f64 1.988 1.992 1.990 1.288 (1.5x) 1.296 (1.5x) 1.292 (1.5x)
F32 F64

Chebyshev f32 Euclidean f32 Squared Euclidean f32 Manhattan f32 Canberra f32 Cosine f32

Chebyshev f64 Euclidean f64 Squared Euclidean f64 Manhattan f64 Canberra f64 Cosine f64

u32 u64

Bray-Curtis u32

Bray-Curtis u64

String Distance Benchmarks

These benchmarks were run on an Intel Core i7-11700KF CPU @ 4.900GHz, using a single thread. The OS was Arch Linux, with kernel version 6.7.4-arch1-1.

All string distances were computed 100 times each, among different pairs of strings, and the average time was taken.

Hamming Levenshtein Needleman-Wunsch

License

This package is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abd_distances-1.0.4.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

abd_distances-1.0.4-cp39-abi3-win_amd64.whl (393.9 kB view details)

Uploaded CPython 3.9+Windows x86-64

abd_distances-1.0.4-cp39-abi3-win32.whl (349.4 kB view details)

Uploaded CPython 3.9+Windows x86

abd_distances-1.0.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (513.5 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

abd_distances-1.0.4-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (846.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ s390x

abd_distances-1.0.4-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (535.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ppc64le

abd_distances-1.0.4-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (455.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARMv7l

abd_distances-1.0.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (464.9 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

abd_distances-1.0.4-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (524.0 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.12+ i686

abd_distances-1.0.4-cp39-abi3-macosx_11_0_arm64.whl (458.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

abd_distances-1.0.4-cp39-abi3-macosx_10_12_x86_64.whl (491.8 kB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file abd_distances-1.0.4.tar.gz.

File metadata

  • Download URL: abd_distances-1.0.4.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for abd_distances-1.0.4.tar.gz
Algorithm Hash digest
SHA256 6021ba4eeae0727690d5bb77d2c40c0ee5a13cdc75a1cfdf0f6cc6a0dc57681d
MD5 c8a5def2d6b894bd845a627178406a58
BLAKE2b-256 e132dc5b001404b02463382cfa8a2d33ae8a962fa50a97283eba4c5f22b4a302

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1075a2ed608fd39f7f296ef70cfdc061ff2363c49ba461be5309f37630c73573
MD5 6f27f5b049a53e1fb317a4c8637cdfb5
BLAKE2b-256 7c54756ceaf40c96b2700a140cf1be1bcc6cc0622c5aa02b50d15ccd418914af

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-win32.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-win32.whl
Algorithm Hash digest
SHA256 12a703322fcb31f5f555962f6f2dfe899868624368a1c6be6f4ed1bb563cc90e
MD5 44836bc95be5fa9b5590821f4cea28d9
BLAKE2b-256 84ab61a30f7f74003168e55448303380a8ee812266f2ce82f2636417b05d3a72

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 151885f81ff8608875077e2169e0abe508efe5d228925362331b369280f9c844
MD5 30e14205bf4ae28e43cb2c770344f676
BLAKE2b-256 4277a86489c6c066c4e82643344fd7c927ca67afd6ef3f422258b00d65a2a436

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 46ba4809675ad913254001417493908bc6f1fe6109545d7c5c46f8436f8be355
MD5 710d3ea6caf093143abb630c31d8bc59
BLAKE2b-256 5b3516ba1be8ed0690ca38536234f92f54cbd87fda2bc92b095912a8cae52b0c

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 5f2823a020da5032fd479866580a35dfbd2f89e02f2d4ca22b64f10fc99ca9e9
MD5 e1e29ffbbacc36bf47865ca5ebcf0f61
BLAKE2b-256 088b3356593c7cd0d8483566a79e3f4caea5beeb399a14a63f5dd322a7f45003

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 221823029ca59f776ae9ca1f2626f23b416e7393d57590e175736b220f62e09a
MD5 0880c49ec3c059d99cb1003a9c575c8e
BLAKE2b-256 6fb933cd034571caba7a83a5c9a65d55953790ed2f9fb477ac4a7cbda1e99a0d

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 70ebe3b3a9539578a6519eb3637091b46afec63f28d6ab4296b679543bcf96b8
MD5 72f2ac79ab34cc465593fdf6e7a67a6b
BLAKE2b-256 847dbf1689432d8200f86ea150a9e05d8b844e96d719790b0b1bde8eb9b9c0b9

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 c2cb7d3de70ae92c3aa5c15fd8b19610dc7027032055906690c3b777decba501
MD5 a9ab5208ab1e7c6103d6c6390472c21d
BLAKE2b-256 5559bd191e5e639d0c58e1b2eba34a9244e4eb3676b0283b47b4ecc6070ca80e

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 361e041af2a01532f11e03212e0b5e67e9aa71caf4d3aba2958e1ec4befb7144
MD5 53c95a90718ab605f01d48b0f443eb51
BLAKE2b-256 f821fe0041461c38a438a8373c1570e7ae6550dc7da0fd565ec621b589329b97

See more details on using hashes here.

File details

Details for the file abd_distances-1.0.4-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for abd_distances-1.0.4-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 82b26bad2211b062d3eca68dc10cfb9b3d028da61c6f2908e2265e2168cff0f0
MD5 ae150ef5c8c3e9267f77b866a890b09c
BLAKE2b-256 547db0cb7b37e972620360e3b1add90ef5d3201075b8fdcc01fbfaaf168260f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page