Skip to main content

A package for fast nanquantile calculation

Project description

Code style: black Tests image

fastnanquantile

An alternative implementation of numpy's nanquantile function. It's faster in many cases, especially for 2D and 3D arrays. Note that np.quantile is much faster than np.nanquantile, but it doesn't support NaN values. This package is intended to be used when NaN values are present in the data.

Installation

To install the package, run the command below

pip install git+https://github.com/lbferreira/fastnanquantile

Usage

The function was designed to be very similar to numpy's nanquantile function. Example:

import numpy as np
import fastnanquantile as fnq

sample_data = np.random.random((50, 100, 100))
start = time.time()
np_result = np.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for np.nanquantile: {time.time() - start}s')
# Printed: Time for np.nanquantile: 0.2658s
start = time.time()
fnq_result = fnq.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for fnq.nanquantile: {time.time() - start}s')
# Printed: Time for fnq.nanquantile: 0.0099s
# Disclaimer: The time for the first call to fnq.nanquantile is slower than
# the following calls, due to the compilation time of the function.

Xarray compatible function

Xarray is a powerful library for working with multidimensional arrays. It can be used to compute quantiles along a given dimension of a DataArray. Numpy's nanquantile function is used under the hood. To extend the use of fastnanquantile to xarray, a funtion is provided to compute quantiles for a DataArray, with a similiar behavior of xarray's quantile implementation. Example:

import numpy as np
import xarray as xr
from fastnanquantile import xrcompat

da = xr.DataArray(
    np.random.rand(10, 1000, 1000),
    coords={"time": np.arange(10), "x": np.arange(1000), "y": np.arange(1000)},
)

# Xarray quantile (time to run: ~25s)
result_xr = da.quantile(q=0.6, dim="time")
# fastnanquantile (time to run: <1s)
result_fnq = xrcompat.xr_apply_nanquantile(da, q=0.6, dim="time")
# Check if results are equal (If results are different, an error will be raised)
np.testing.assert_almost_equal(result_fnq.values, result_result_xrfnq.values, decimal=4)

A case study using Xarray + Dask to create time composites from satelitte images can be found in this notebook: examples/example_xarray.ipynb.

Benchmarks

Some benchmarks were made to compare the performance of fastnanquantile with numpy's nanquantile function. More information can be found in this notebook: examples/example.ipynb.

Benchmarks conclusions

The performance gains offered by the fastnanquantile implementation depends on the shape of the input array. Based on the benchmark results, we can conclude:

  • 1D arrays: numpy is faster.
  • 2D arrays: fastnanquantile is faster for arrays with axis with sizes noticeably different from each other (example: (50, 1000)).
  • 3D arrays: fastnanquantile is generally faster, especially when the reduction axis is smaller than the other ones. For example, with shape=(50, 1000, 1000) and reduction axis=0, fastnanquantile is a lot faster than numpy.
  • Finally, fastnanquantile can be a great alternative in many cases, especially for 2D and 3D arrays, with potential to greatly speedup quantiles computation.

Acknowledgements

This library was developed as part of my research work in the GCER lab, under supervision of Vitor Martins, at the Mississippi State University (MSU).

This research is funded by USDA NIFA (award #2023-67019-39169), supporting Lucas Ferreira and Vitor Martins at MSU.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastnanquantile-0.0.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastnanquantile-0.0.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file fastnanquantile-0.0.1.tar.gz.

File metadata

  • Download URL: fastnanquantile-0.0.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for fastnanquantile-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a94cf83ce3dbc8ed6a96095beb7346f43a84ccd7c055d3c38f7a3fa6c115b626
MD5 886bbe8ba08eb8484d4d2be170252514
BLAKE2b-256 7277fdd1e96af2d201c9b3e8dd692e97513a9997eaaad1d7a6d67bba0f99f5c2

See more details on using hashes here.

File details

Details for the file fastnanquantile-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fastnanquantile-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87ef1df820778aa3d2fd14251205c37209fe03fb0b8629974b7e28dc01681e72
MD5 a75d3995df77462ddbb44509577b5964
BLAKE2b-256 b0fab396c8d91746a37512f4a467d83febbe9fbee8cfe64fc4099d03f4b84d16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page