package for metocean statistics
Project description
SeaStats
seastats is a simple package to compare and analyse 2 time series. We use the following convention in this repo:
sim: modelled surge time seriesmod: observed surge time series
The main function is:
def get_stats(
sim: Series,
obs: Series,
metrics: Sequence[str] = SUGGESTED_METRICS,
quantile: float = 0,
cluster: int = 24,
round: int = -1
) -> dict[str, float]
Which calculates various statistical metrics between the simulated and observed time series data.
Easy API - for both general and storm-specific metocean metrics
get all metrics in a 3 liner:
from seastats import get_stats, GENERAL_METRICS, STORM_METRICS
general = get_stats(sim, obs, metrics = GENERAL_METRICS)
storm = get_stats(sim, obs, quantile = 0.99, metrics = STORM_METRICS) # ! we use a different quantile for PoT selection
pd.DataFrame(dict(general, **storm), index=['abed'])
this returns:
| station | bias | rms | rmse | cr | nse | kge | R1 | R3 | error |
|---|---|---|---|---|---|---|---|---|---|
| abed | -0.007 | 0.086 | 0.086 | 0.817 | 0.677 | 0.81 | 0.237364 | 0.147163 | 0.0938142 |
Install
PiPy
pip install seastats
Conda / Mamba
mamba install -c conda-forge seastats
Parameters:
- sim (pd.Series). The simulated time series data.
- obs (pd.Series). The observed time series data.
- metrics (list[str]). (Optional) The list of statistical metrics to calculate. If metrics = ["all"], all items in
SUPPORTED_METRICSwill be calculated. Default is all items inSUGGESTED_METRICS. - quantile (float). (Optional) Quantile used to calculate the metrics. Default is
0(no selection) - cluster (int). (Optional) Cluster duration for grouping storm events. Default is
72hours. - round (int). (Optional) Apply rounding to the results to. Default is no rounding (value is
-1)
Returns a dictionary containing the calculated metrics and their corresponding values. With 2 types of metrics:
- The "general" metrics: All the basic metrics needed for signal comparison (RMSE, RMS, Correlation etc..). See details below
bias: Biasrmse: Root Mean Square Errorrms: Root Mean Squarerms_95: Root Mean Square for data points above 95th percentilesim_mean: Mean of simulated valuesobs_mean: Mean of observed valuessim_std: Standard deviation of simulated valuesobs_std: Standard deviation of observed valuesmae: Mean Absolute Errormse: Mean Square Errornse: Nash-Sutcliffe Efficiencylambda: Lambda indexcr: Pearson Correlation coefficientcr_95: Pearson Correlation coefficient for data points above 95th percentileslope: Slope of Model/Obs correlationintercept: Intercept of Model/Obs correlationslope_pp: Slope of Model/Obs correlation of percentilesintercept_pp: Intercept of Model/Obs correlation of percentilesmad: Mean Absolute Deviationmadp: Mean Absolute Deviation of percentilesmadc:mad + madpkge: Kling–Gupta Efficiency
- The storm metrics: a PoT selection is done on the observed signal (using the
match_extremes()function). Function returns the decreasing extreme event peak values for observed and modeled signals (and time lag between events).R1: Difference between observed and modelled for the biggest stormR1_abs: Absolute difference between observed and modelled for the biggest stormR1_abs_norm: Absolute normalized difference between observed and modelled for the biggest stormR3: Averaged difference between observed and modelled for the three biggest stormsR3_abs: Averaged absolute difference between observed and modelled for the three biggest stormsR3_abs_norm: Average of the normalized absolute difference between observed and modelled for the three biggest stormserror: Averaged difference between modelled values and observed detected stormsabs_error: Averaged absolute difference between modelled values and observed detected stormsabs_error_norm: Average of the normalized absolute difference between modelled values and observed detected storms
General metrics
A. Dimensional Statistics:
Mean Error (or Bias)
$$\langle x_c - x_m \rangle = \langle x_c \rangle - \langle x_m \rangle$$
RMSE (Root Mean Squared Error)
$$\sqrt{\langle(x_c - x_m)^2\rangle}$$
Mean-Absolute Error (MAE):
$$\langle |x_c - x_m| \rangle$$
B. Dimentionless Statistics (best closer to 1)
Performance Scores (PS) or Nash-Sutcliffe Eff (NSE): $$1 - \frac{\langle (x_c - x_m)^2 \rangle}{\langle (x_m - x_R)^2 \rangle}$$
Correlation Coefficient (R):
$$\frac {\langle x_{m}x_{c}\rangle -\langle x_{m}\rangle \langle x_{c}\rangle }{{\sqrt {\langle x_{m}^{2}\rangle -\langle x_{m}\rangle ^{2}}}{\sqrt {\langle x_{c}^{2}\rangle -\langle x_{c}\rangle ^{2}}}}$$
Kling–Gupta Efficiency (KGE):
$$1 - \sqrt{(r-1)^2 + b^2 + (g-1)^2}$$ with :
rthe correlationbthe modified bias term (see ref) $$\frac{\langle x_c \rangle - \langle x_m \rangle}{\sigma_m}$$gthe std dev term $$\frac{\sigma_c}{\sigma_m}$$
Lambda index ($\lambda$), values closer to 1 indicate better agreement:
$$\lambda = 1 - \frac{\sum{(x_c - x_m)^2}}{\sum{(x_m - \overline{x}_m)^2} + \sum{(x_c - \overline{x}_c)^2} + n(\overline{x}_m - \overline{x}_c)^2 + \kappa}$$
- with
kappa$$2 \cdot \left| \sum{((x_m - \overline{x}_m) \cdot (x_c - \overline{x}_c))} \right|$$
case of NaNs
The storm_metrics() might return:
{'R1': np.nan,
'R1_abs': np.nan,
'R1_abs_norm': np.nan,
'R3': np.nan,
'R3_abs': np.nan,
'R3_abs_norm': np.nan,
'error': np.nan,
'abs_error': np.nan,
'abs_error_norm': np.nan}
Extreme events
Example of implementation:
from seastats.storms import match_extremes
extremes_df = match_extremes(sim, obs, 0.99, cluster = 72)
extremes_df
The modeled peaks are matched with the observed peaks. Function returns a pd.DataFrame of the decreasing observed storm peaks as follows:
| time observed | observed | time observed | model | time model | error | abs_error | abs_error_norm | tdiff |
|---|---|---|---|---|---|---|---|---|
| 2022-01-29 19:30:00 | 0.803 | 2022-01-29 19:30:00 | 0.565 | 2022-01-29 17:00:00 | -0.237 | 0.237 | 0.296 | -2.5 |
| 2022-02-20 20:30:00 | 0.639 | 2022-02-20 20:30:00 | 0.577 | 2022-02-20 20:00:00 | -0.062 | 0.062 | 0.0963 | -0.5 |
| ... | ||||||||
| 2022-11-27 15:30:00 | 0.386 | 2022-11-27 15:30:00 | 0.400 | 2022-11-27 17:00:00 | 0.014 | 0.014 | 0.036 | 1.5 |
with:
diffthe difference between modeled and observed peakserrorthe absolute difference between modeled and observed peakstdiffthe time difference between modeled and observed peaks
NB: the function uses pyextremes in the background, with PoT method, using the quantile value of the observed signal as physical threshold and passes the cluster_duration argument.
this happens when the function storms/match_extremes.py couldn't find concomitent storms for the observed and modeled time series.
Usage
see notebook for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seastats-0.1.2.tar.gz.
File metadata
- Download URL: seastats-0.1.2.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.10 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca244cb8e014e31082bdfd7d424a09af1bf118225927686f4b7ac1f198ab103a
|
|
| MD5 |
5fa5381c14e9e03f1fc6a06e45597793
|
|
| BLAKE2b-256 |
95e7126fdcca15ccecb7ada7784ca0a374285b802dddcf05290bc50db3478f22
|
File details
Details for the file seastats-0.1.2-py3-none-any.whl.
File metadata
- Download URL: seastats-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.10 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1165461e734e8bd0854e34c6a90e03e6cd39948183088a287feda586f52d1a7a
|
|
| MD5 |
397692db3f379a734d47d3a74ffceccb
|
|
| BLAKE2b-256 |
36b389e1bdbd6aef6dd9c3a5f4603af81615f6096ca28c7ab13de642bea80485
|