Oceanum library for computing gridded statistics on oceanographic datasets
Project description
gridstats
gridstats is an Oceanum library for computing gridded statistics over large oceanographic and climate datasets. Pipelines are defined in YAML and run as a single CLI command. Computation is lazy and out-of-core via xarray and dask, so datasets of arbitrary size are handled without loading them into memory.
Features
- YAML-driven pipelines — source, operations, and output are all declared in one config file
- Out-of-core — processes arbitrarily large grids lazily; spatial
tiles:keeps peak memory bounded - Rich stat library — aggregations, quantiles, exceedance, return period values, directional stats, distributions, and more
- Multiple output formats — NetCDF or Zarr
- CF-compliant — output variables are automatically annotated with standard names, units, and long names
- Extensible — register custom stat functions and loaders via decorator or package entry point
Installation
Requires Python ≥ 3.10.
pip install gridstats
For loading data from an intake catalog:
pip install "gridstats[intake]"
Quick start
1. Write a config file
# stats.yml
source:
type: xarray
urlpath: /data/hindcast/waves.zarr
engine: zarr
sel:
time: {start: "2000-01-01", stop: "2020-12-31"}
latitude: {start: -50, stop: -30}
longitude: {start: 160, stop: 180}
output:
outfile: ./wave_stats.zarr
calls:
- func: mean
dim: time
data_vars: [hs, tp]
- func: quantile
dim: time
data_vars: [hs]
q: [0.5, 0.90, 0.95, 0.99]
- func: rpv
dim: time
data_vars: [hs]
return_periods: [10, 50, 100]
distribution: gumbel_r
2. Run it
gridstats run stats.yml
The output dataset will contain variables like hs_mean, tp_mean, hs_quantile, and hs_rpv, each with CF-standard attributes.
3. Use the result
import xarray as xr
ds = xr.open_zarr("wave_stats.zarr")
print(ds)
Available stat functions
| Function | Description |
|---|---|
mean, max, min, std, count |
Basic aggregations |
quantile |
Quantiles at arbitrary probability levels |
pcount |
Count of non-NaN values per grid cell |
exceedance / nonexceedance |
Probability of exceeding a threshold |
range_probability |
Probability of a value falling in a range |
rpv |
Return period values via extreme value fitting |
distribution2 / distribution3 |
2- and 3-parameter distribution fitting |
statdir |
Directional statistics (sector-binned) |
hmo |
Significant wave height from spectral moments |
winpow |
Wind power density |
All calls accept a group: key (month, season, hour, …) to compute statistics per calendar period.
Grouping and spatial tiling
calls:
# Monthly mean
- func: mean
dim: time
data_vars: [hs]
group: month
# Quantile with spatial tiling to control memory
- func: quantile
dim: time
data_vars: [hs]
q: [0.95, 0.99]
chunks: {time: -1, latitude: 50, longitude: 50}
tiles: {latitude: 10, longitude: 10}
Plugin system
Register a custom stat function in your own package:
from gridstats.registry import register_stat
import xarray as xr
@register_stat("my_stat")
def my_stat(data: xr.Dataset, *, dim: str = "time", **kwargs) -> xr.Dataset:
...
Or declare it as a package entry point so it is discovered automatically:
[project.entry-points."gridstats.stats"]
my_stat = "my_package.stats:my_stat"
CLI
Usage: gridstats [OPTIONS] COMMAND [ARGS]...
Commands:
run Run a stats pipeline from a YAML configuration file.
list-stats List all registered stat functions.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gridstats-2.0.0.tar.gz.
File metadata
- Download URL: gridstats-2.0.0.tar.gz
- Upload date:
- Size: 102.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
769d494086b34dc1ab83e997f582c860d2bec3c749491db293186912b4c94d1e
|
|
| MD5 |
0318769967d5531882784c0a3bd40662
|
|
| BLAKE2b-256 |
ec2c941f28b9ad29bef6a9210a119a32860d6c9647e69c5d1a84eacd53c14481
|
Provenance
The following attestation bundles were made for gridstats-2.0.0.tar.gz:
Publisher:
publish.yml on oceanum/gridstats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gridstats-2.0.0.tar.gz -
Subject digest:
769d494086b34dc1ab83e997f582c860d2bec3c749491db293186912b4c94d1e - Sigstore transparency entry: 1178796028
- Sigstore integration time:
-
Permalink:
oceanum/gridstats@747d70f3eed95019067098e541d3961f54c4e8d0 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/oceanum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@747d70f3eed95019067098e541d3961f54c4e8d0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gridstats-2.0.0-py3-none-any.whl.
File metadata
- Download URL: gridstats-2.0.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
016d52bd01dcc9b70b10e091b786e9e503bcf6bf2ea25e9c81473a92db6ce40a
|
|
| MD5 |
e212be5783a1481e484659aa2b944e18
|
|
| BLAKE2b-256 |
40463205562399832821e4ac91baf749e39532c8f002d0a3f231954b799c3a91
|
Provenance
The following attestation bundles were made for gridstats-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on oceanum/gridstats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gridstats-2.0.0-py3-none-any.whl -
Subject digest:
016d52bd01dcc9b70b10e091b786e9e503bcf6bf2ea25e9c81473a92db6ce40a - Sigstore transparency entry: 1178796092
- Sigstore integration time:
-
Permalink:
oceanum/gridstats@747d70f3eed95019067098e541d3961f54c4e8d0 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/oceanum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@747d70f3eed95019067098e541d3961f54c4e8d0 -
Trigger Event:
release
-
Statement type: