A package for fast nanquantile calculation
Project description
fastnanquantile
An alternative implementation of numpy's nanquantile function. It's faster in many cases, especially for 2D and 3D arrays. Note that np.quantile is much faster than np.nanquantile, but it doesn't support NaN values. This package is intended to be used when NaN values are present in the data.
Installation
To install the package, run the command below
pip install fastnanquantile
Usage
The function was designed to be very similar to numpy's nanquantile function. Example:
import numpy as np
import fastnanquantile as fnq
sample_data = np.random.random((50, 100, 100))
start = time.time()
np_result = np.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for np.nanquantile: {time.time() - start}s')
# Printed: Time for np.nanquantile: 0.2658s
start = time.time()
fnq_result = fnq.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for fnq.nanquantile: {time.time() - start}s')
# Printed: Time for fnq.nanquantile: 0.0099s
# Disclaimer: The time for the first call to fnq.nanquantile is slower than
# the following calls, due to the compilation time of the function.
Xarray compatible function
Xarray is a powerful library for working with multidimensional arrays. It can be used to compute quantiles along a given dimension of a DataArray. Numpy's nanquantile function is used under the hood. To extend the use of fastnanquantile to xarray, a funtion is provided to compute quantiles for a DataArray, with a similiar behavior of xarray's quantile implementation. Example:
import numpy as np
import xarray as xr
from fastnanquantile import xrcompat
da = xr.DataArray(
np.random.rand(10, 1000, 1000),
coords={"time": np.arange(10), "x": np.arange(1000), "y": np.arange(1000)},
)
# Xarray quantile (time to run: ~25s)
result_xr = da.quantile(q=0.6, dim="time")
# fastnanquantile (time to run: <1s)
result_fnq = xrcompat.xr_apply_nanquantile(da, q=0.6, dim="time")
# Check if results are equal (If results are different, an error will be raised)
np.testing.assert_almost_equal(result_fnq.values, result_xr.values, decimal=4)
A case study using Xarray + Dask to create time composites from satelitte images can be found in this notebook: examples/example_xarray.ipynb.
Benchmarks
Some benchmarks were made to compare the performance of fastnanquantile with numpy's nanquantile function. More information can be found in this notebook: examples/example.ipynb.
Benchmarks conclusions
The performance gains offered by the fastnanquantile implementation depends on the shape of the input array. Based on the benchmark results, we can conclude:
- 1D arrays: numpy is faster.
- 2D arrays: fastnanquantile is faster for arrays with axis with sizes noticeably different from each other (example: (50, 1000)).
- 3D arrays: fastnanquantile is generally faster, especially when the reduction axis is smaller than the other ones. For example, with shape=(50, 1000, 1000) and reduction axis=0, fastnanquantile is a lot faster than numpy.
- Finally, fastnanquantile can be a great alternative in many cases, especially for 2D and 3D arrays, with potential to greatly speedup quantiles computation.
Acknowledgements
This library was developed as part of my research work in the GCER lab, under supervision of Vitor Martins, at the Mississippi State University (MSU).
This research is funded by USDA NIFA (award #2023-67019-39169), supporting Lucas Ferreira and Vitor Martins at MSU.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastnanquantile-0.0.2.tar.gz
.
File metadata
- Download URL: fastnanquantile-0.0.2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d0059370a3e00f706531ed2f390433ea3d0d3dadc31574fd2c2cfde440ee326 |
|
MD5 | 7cf70c4c8982f5315b15ba576e34cd46 |
|
BLAKE2b-256 | e2fdcf1d153912c9e10777245ee3ca0c59bd24727e74d9573448f3577bae19b1 |
File details
Details for the file fastnanquantile-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: fastnanquantile-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7287ed56b2e0eb33ddc9888421694ca421f384395ace27b7d03227d68db08637 |
|
MD5 | 88afacca7e0eafb53ca510dc5d5b0379 |
|
BLAKE2b-256 | a46d53379106827a6e605942a418f6705c5c3d0d689fac590df075182b3aa1f3 |