Skip to main content

Reference-based standardization framework for hydroclimate drought indices

Project description

rsd - Reference-based standardization framework for drought indices under distribution shift

rsd

Standardized drought indices (SPI, SSI, SDI, SPEI) that are comparable across model runs, scenarios, and reanalysis products.

Full documentation: https://hydro-rsd.readthedocs.io

rsd computes standardized hydroclimate indices by fitting the CDF from a fixed reference dataset rather than from the target itself, so that values from different model runs, scenarios, or observational products can be compared on the same scale.

Why RSD?

Standard implementations of SPI/SSI standardize each series against itself, which removes the cross-series differences you want to measure. RSD solves three interdependent problems:

  1. Fixed reference - fit the CDF once from a reference period or dataset; evaluate target values against it.
  2. GPD tail extension - empirical CDFs cap at the observed range. RSD fits Generalized Pareto Distribution tails for smooth extrapolation beyond the reference sample.
  3. Pooled deseasonalization - per-month samples are too sparse to fit EVT tails. A 50-year record gives ~50 values per calendar month; the top/bottom 10% (the tail) is only ~5 exceedances per month - too few for a stable GPD fit. RSD removes the per-month location and scale, then pools all 12 months into one sample (~600 values, ~60 exceedances per tail) where the fit becomes feasible.

This pooling is what keeps RSD usable on short records. With 20 years of monthly data (typical for satellite-era datasets) the pooled tail still has ~24 exceedances - enough to fit a single GPD - while a per-month tail fit would have only ~2 exceedances per month per tail and is infeasible.

Monthwise ECDF and fully parametric (e.g. gamma) methods are also included as baselines.

Requirements

  • Python ≥ 3.10
  • NumPy ≥ 1.24
  • SciPy ≥ 1.10

Optional extras:

  • [xarray] - xarray ≥ 2023.1, dask ≥ 2023.1 (N-D + parallel computation)
  • [diagnostics] - matplotlib ≥ 3.7 (rsd.diagnose plots)

Installation

pip install rsd                  # NumPy/SciPy only
pip install rsd[xarray]          # adds xarray + dask support
pip install rsd[diagnostics]     # adds matplotlib for rsd.diagnose
pip install rsd[all]             # everything above

Quick start

In RSD vocabulary, the reference defines what "normal" looks like (e.g. observed climate over a baseline period) and the target is the series you want to score against that normal (e.g. a future projection or a different scenario). Output z is in standard-normal units: z ≈ 0 is climatology and |z| > 2 is extreme.

import numpy as np
import rsd

# 1-D: standardize a 1200-month target against a 600-month reference
rng = np.random.default_rng(0)
months_ref = np.tile(np.arange(1, 13), 50)    # 600 months
months_tgt = np.tile(np.arange(1, 13), 100)   # 1200 months
ref = rng.gamma(shape=2, scale=5, size=600)
tgt = rng.gamma(shape=2, scale=5, size=1200)

z = rsd.standardize(
    target=tgt,
    reference=ref,
    months_target=months_tgt,
    months_reference=months_ref,
    scale=3,                                   # 3-month accumulation (e.g. SPI-3)
)
# N-D: xarray wrapper (dask-parallelized for large grids).
# Months are extracted automatically from the time coordinate, so you do
# not pass months_target / months_reference here.
import rsd

z = rsd.standardize_xr(
    target=target_da,        # xr.DataArray with a "time" dimension
    reference=ref_da,        # xr.DataArray with a "time" dimension
    method="rsd",            # or "monthwise_ecdf" / "monthwise_parametric"
    scale=3,
    parallel=True,
)

Diagnostics

rsd.diagnose(values, months, name, ...) is the one-call entry point that verifies the exchangeability assumption underlying RSD pooling. It prints an overview block (configuration plus extracted seasonal location and scale), renders a combined summary figure (before / after deseasonalization KDEs), and prints an Anderson-Darling omnibus plus per-month Kolmogorov-Smirnov leave-one-out report. Pass bounds=(L, U) to add a logit-bounded pathway alongside the baseline; add auto_bounds=True to also see a heuristic data-driven bound via rsd.estimate_bounds. Use quiet=True for batch / CI runs. See the diagnostics_showcase.ipynb notebook for a worked example.

Methods

method description
"rsd" Deseasonalize -> pool -> ECDF core + GPD tails
"monthwise_ecdf" Per-month empirical CDF (classical SPI-style)
"monthwise_parametric" Per-month parametric fit (gamma, norm, …)

The monthwise_ecdf baseline matches the SDAT framework of Farahmand & AghaKouchak (2015). The monthwise_parametric path defaults to floc=0 (the canonical SPI convention of Stagge et al. (2015)); pass floc=None to recover scipy's free-location 3-parameter fit.

Contributing & issues

Bug reports and questions are welcome at https://github.com/thchilly/rsd/issues. Contributions follow the workflow in CONTRIBUTING.md.

How to cite

If you use this package in your research, please cite the methodology paper:

Tsilimigkras, A., Grillakis, M., & Koutroulis, A. (2026). A reference-based standardization framework for hydroclimate drought indices under distribution shift. Manuscript submitted to Water Resources Research. DOI pending acceptance.

For reproducibility, you may additionally cite the specific software version:

Tsilimigkras, A. (2026). rsd: Reference-based standardization framework for hydroclimate drought indices (Version 1.0.0) [Computer software]. Zenodo DOI: pending.

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rsd-1.0.0.tar.gz (963.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rsd-1.0.0-py3-none-any.whl (47.5 kB view details)

Uploaded Python 3

File details

Details for the file rsd-1.0.0.tar.gz.

File metadata

  • Download URL: rsd-1.0.0.tar.gz
  • Upload date:
  • Size: 963.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rsd-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a9ca10b12e05680985276bd6ce17e8eca513ad4220371aa2566eae22c8700134
MD5 3f00f157899e9f92759f4b7660d55cf5
BLAKE2b-256 2ea17127e6126df7320d2ac0262a2ceaed932f5936e12df6095d9dfdd7187e12

See more details on using hashes here.

Provenance

The following attestation bundles were made for rsd-1.0.0.tar.gz:

Publisher: publish.yml on thchilly/rsd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rsd-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: rsd-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 47.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rsd-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3da538f1fd721386051516cea42a5086d09d257b6e60bb8c3b13cddeb1abb7a3
MD5 1bf9aad863084dd1c3dc12e93cffc85c
BLAKE2b-256 372bcc8d225aec7de0fe204ddf0c5da2555f650cd0b4f5baf8878b2f5950a037

See more details on using hashes here.

Provenance

The following attestation bundles were made for rsd-1.0.0-py3-none-any.whl:

Publisher: publish.yml on thchilly/rsd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page