Skip to main content

A package for fast nanquantile calculation

Project description

Code style: black Tests image

fastnanquantile

An alternative implementation of numpy's nanquantile function. It's faster in many cases, especially for 2D and 3D arrays. Note that np.quantile is much faster than np.nanquantile, but it doesn't support NaN values. This package is intended to be used when NaN values are present in the data.

Installation

To install the package, run the command below

pip install fastnanquantile

Usage

The function was designed to be very similar to numpy's nanquantile function. Example:

import numpy as np
import fastnanquantile as fnq

sample_data = np.random.random((50, 100, 100))
start = time.time()
np_result = np.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for np.nanquantile: {time.time() - start}s')
# Printed: Time for np.nanquantile: 0.2658s
start = time.time()
fnq_result = fnq.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for fnq.nanquantile: {time.time() - start}s')
# Printed: Time for fnq.nanquantile: 0.0099s
# Disclaimer: The time for the first call to fnq.nanquantile is slower than
# the following calls, due to the compilation time of the function.

Xarray compatible function

Xarray is a powerful library for working with multidimensional arrays. It can be used to compute quantiles along a given dimension of a DataArray. Numpy's nanquantile function is used under the hood. To extend the use of fastnanquantile to xarray, a funtion is provided to compute quantiles for a DataArray, with a similiar behavior of xarray's quantile implementation. Example:

import numpy as np
import xarray as xr
from fastnanquantile import xrcompat

da = xr.DataArray(
    np.random.rand(10, 1000, 1000),
    coords={"time": np.arange(10), "x": np.arange(1000), "y": np.arange(1000)},
)

# Xarray quantile (time to run: ~25s)
result_xr = da.quantile(q=0.6, dim="time")
# fastnanquantile (time to run: <1s)
result_fnq = xrcompat.xr_apply_nanquantile(da, q=0.6, dim="time")
# Check if results are equal (If results are different, an error will be raised)
np.testing.assert_almost_equal(result_fnq.values, result_xr.values, decimal=4)

A case study using Xarray + Dask to create time composites from satelitte images can be found in this notebook: examples/example_xarray.ipynb.

Benchmarks

Some benchmarks were made to compare the performance of fastnanquantile with numpy's nanquantile function. More information can be found in this notebook: examples/example.ipynb.

Benchmarks conclusions

The performance gains offered by the fastnanquantile implementation depends on the shape of the input array. Based on the benchmark results, we can conclude:

  • 1D arrays: numpy is faster.
  • 2D arrays: fastnanquantile is faster for arrays with axis with sizes noticeably different from each other (example: (50, 1000)).
  • 3D arrays: fastnanquantile is generally faster, especially when the reduction axis is smaller than the other ones. For example, with shape=(50, 1000, 1000) and reduction axis=0, fastnanquantile is a lot faster than numpy.
  • Finally, fastnanquantile can be a great alternative in many cases, especially for 2D and 3D arrays, with potential to greatly speedup quantiles computation.

Acknowledgements

This library was developed as part of my research work in the GCER lab, under supervision of Vitor Martins, at the Mississippi State University (MSU).

This research is funded by USDA NIFA (award #2023-67019-39169), supporting Lucas Ferreira and Vitor Martins at MSU.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastnanquantile-0.0.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

fastnanquantile-0.0.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file fastnanquantile-0.0.2.tar.gz.

File metadata

  • Download URL: fastnanquantile-0.0.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for fastnanquantile-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1d0059370a3e00f706531ed2f390433ea3d0d3dadc31574fd2c2cfde440ee326
MD5 7cf70c4c8982f5315b15ba576e34cd46
BLAKE2b-256 e2fdcf1d153912c9e10777245ee3ca0c59bd24727e74d9573448f3577bae19b1

See more details on using hashes here.

File details

Details for the file fastnanquantile-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for fastnanquantile-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7287ed56b2e0eb33ddc9888421694ca421f384395ace27b7d03227d68db08637
MD5 88afacca7e0eafb53ca510dc5d5b0379
BLAKE2b-256 a46d53379106827a6e605942a418f6705c5c3d0d689fac590df075182b3aa1f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page