Skip to main content

Compute distances in numpy arrays with nans

Project description

Nandist: Calculating distances in arrays with missing values

The python library nandist enables (fast) computation of various distances in numpy arrays containing missing (NaN) values. These distances are implemented as a drop-in replacement for distance functions in the scipy.spatial.distance module.

The distance functions in nandist can be used as a drop-in replacement for the distance functions in scipy.spatial.distance. Currently, nandist offers the following distance functions:

  • chebyshev
  • cityblock
  • cosine
  • euclidean
  • minkowski

It also provides drop-in replacements for pdist and cdist, which can be used for fast calculation of pairwise distances of arrays in matrices.

  • cdist
  • pdist

These functions can be passed a distance metric (metric) and optional parameters such as a weight vector (w) and distance metric parameters such as Minkowski's p parameter.

Examples

A simple example for calculating the cityblock distance between (0, 1) and (NaN, 0) is shown below.

>>> import nandist
>>> import scipy
>>> import numpy as np
>>>
>>> # City-block distance between  (0, 1) and (NaN, 0)
>>> u, v = np.array([0, 1]), np.array([np.nan, 0])
>>> scipy.spatial.distance.cityblock(u, v)
nan
>>> nandist.cityblock(u, v)
1.0

You can replace the function cityblock by any of the supported distance functions.

You can get pairwise distances between arrays in two matrices using cdist. The NaNs do not need to be in the same component.

>>> import nandist
>>> import numpy as np

>>> # City-block distances between vectors A = [(0, 0), (1, NaN)] and vectors B=[(1, NaN) and (1, 1)]
>>> XA, XB = np.array([[0, 0], [1, np.nan]]), np.array([[1, np.nan], [1, 1]])
>>> Y = nandist.cdist(XA, XB, metric="cityblock")
array([[1., 2.],
       [0., 0.]])

How to install

Using pip:

pip install nandist

Supported metrics

Supported distance metrics are:

  • Chebyshev: chebyshev, metric="chebyshev"
  • Cityblock: cityblock, metric="cityblock"
  • Cosine: cosine, metric="cosine"
  • Euclidean: euclidean, metric="euclidean"
  • Minkowski: minkowski, metric="minkowski"

If you require support for additional distance metrics, please submit an Issue or Merge Request.

How does it work

In nandist, the components where a vector is NaN will be ignored (interpreted as "any number") in the distance metric. This is achieved by replacing NaN values with zeros and correcting the resulting overestimated distance value. Under the hood, nandist calls functions from scipy.spatial.distance and then applies the corrections using numpy linear algebra. This ensures that the outcomes of nandist functions are equivalent to scipy.spatial.distance distance functions when arrays are passed without NaNs in them. In addition, all heavy computational lifting is done through scipy, requiring only the additional computational cost of applying the corrections.

Does it always work?

No. The package nandist performs a correction on an overestimation of the distances when missing values are imputed as zero. It is possible that this correction runs into the limits of floating point arithmetic. In that case, nandist will raise an appropriate error. However, you don't often run into these edge cases in typical usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nandist-0.9.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

nandist-0.9.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file nandist-0.9.0.tar.gz.

File metadata

  • Download URL: nandist-0.9.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for nandist-0.9.0.tar.gz
Algorithm Hash digest
SHA256 8d36d01bab376f6774ccf0e13a3733d42b8691ee11336baec170cfa882f13cb1
MD5 7759591f57c648c5cee54d96fa706e43
BLAKE2b-256 c98dc49c21363b116cdf95e6da78151efd6c3a89833f0fb05e1f4a9f6099ece5

See more details on using hashes here.

File details

Details for the file nandist-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: nandist-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for nandist-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcb52471c67764fb843b932d1e4b1f372e3f04813bc4c023728236c6bab16bc3
MD5 e336493aa52cc2db85ec656df7c11fc4
BLAKE2b-256 43bff183c73f94d47d068cf79b899ee136abd0686b12e89d49f456872db2d9c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page