Skip to main content

Africa-first library for data, forecasts, and benchmarking.

Project description

Sheerwater

A weather forecast and data benchmarking library. The Sheerwater project is working to benchmark ML- and physics-based weather and climate forecasts regionally and globally with a focus on model performance on the African continent.

Sheerwater contains a set of data accessors to fetch common forecasts and ground-truth data sources, a library of common evaluation metrics, and a metrics interface to validate forecasts against data products and station data.

Getting started

To run this code, you need read access to Sheerwater forecasts and ground truth data stored in our cloud bucket. Some of this data, included CHIRPS, IMERG, ERA5, and ECMWF ER are in a public bucket that requires no additional credentials, so all you have to do is:

  1. Install sheerwater in your environment:
pip install sheerwater
  1. Use sheerwater to access forecasts or data:
from sheerwater.reanalysis import era5
from sheerwater.data import ghcn, chirps_v3
from sheerwater.metrics import grouped_metric

# Get ERA5 as an xarray
ds_era5 = era5("2020-01-01", "2022-01-01", agg_days=1, variable="precip", grid="global1_5",)

# Get gridded GHCN weather station data
ds_ghcn = ghcn("2020-01-01", "2022-01-01", agg_days=7, variable="precip", grid="global0_25")

# Get chirps data with default parameters
ds_chirps = chirps_v2()
  1. Run evaluation metrics on public forecasts or data
# Run an evaluation metric - this might take some time!
val = metric("2016-01-01", "2022-12-31", forecast="era5", truth="ghcn", variable="precip", 
             metric_name="mae", region="country", grid="global1_5")
print(val)

Available data

Dataset Variations Grids Aggregations (days) Available date range Notes
IMERG imerg_late, imerg_final imerg (native), global0_25, global1_5 1, 5, 7, 10 1998-01-01
2024-12-31
CHIRPS chirps_v2, chirps_v3,
chirp_v2, chirp_v3
chirps, global0_25, global1_5 1, 5, 7, 10 2000-06-01
2024-12-31
Some variations extend back to 1998
ERA5 era5 global0_25, global1_5 1, 5, 7, 10 1998-01-01
2024-12-31
From google ARCO
only tmp2m and precip regridded
GHCN ghcn, ghcn_avg global0_25, global1_5 1, 5, 7, 10, 14, 30 1998-01-01
2024-12-31
ghcn picks a random station in a grid cell,
ghcn_avg averageas all stations in a grid cell
TAHMO tahmo, tahmo_avg global0_25, global1_5 1, 5, 7, 10, 14, 30 2016-01-01
2025-06-01
Requires TAHMO Data Agreement

tahmo picks a random station in a grid cell,
tahmo_avg averageas all stations in a grid cell
ECMWF IFS ER ecmwf_ifs_er global1_5 1, 7, 14 2016-01-04
2023-02-12
From the weatherbench archive, known version
FuXi S2S fuxi global1_5 7 2016-01-03
2022-02-02
Only precip and tmp2m

Additional data accessors may be available. Please reach out if you see it in the code base but it's not listed here.

Accessing sheerwater private data

Some data requires access to the sheerwater private bucket. Please send us an email for access so we can discuss use cases and collaboration. After we have added you to our bucket you can run the following commands to access data.

curl https://sdk.cloud.google.com | bash
gcloud auth application-default login

Evaluating your own forecasts against your own data

If you have a forecast you would like to evaluate, you can tag it in the sheerwater forecast decorator so that sheerwater can find it for evaluation.

from sheerwater.forecasts import forecast
from sheerwater.data import data
from sheerwater.metrics import metric

# Forecasts must be xarrays with coordinates for lat, lon, init_time, and 
# prediction_timedelta with a matching variable on the correct grid
@forecast
def my_forecast(start_time, end_time, agg_days, variable, grid, **kwargs):
    ds = fetch_forecast(start_time, end_time, agg_days, variable, grid)
    ds = ds.rename({'start_time': 'init_time', 
                    'timestep': 'prediction_timedelta',
                    'latitude': 'lat',
                    'longitude': 'lon'})
    ds = ds.rename_vars({'precipitation_mm': 'precip'})
    return ds

# Data must be xarrays with coordinates for lat, lon, and time with a 
# matching variable on the correct grid
@data
def my_station_data(start_time, end_time, agg_days, variable, grid, **kwargs):
    ds = fetch_data(start_time, end_time, agg_days, variable, grid)
    return ds

# Evaluate the forecast
metric("2015-01-01", "2022-01-01", forecast="my_forecast", truth="my_station_data", 
        agg_days=1, variable='precip', grid='global1_5', metric_name="bias", 
        region="country", time_grouping="month_of_year")

To support data fetching, sheerwater depends on Nuthatch.

Developing on sheerwater

  1. Install UV
curl -Ls https://astral.sh/uv/install.sh | sh
  1. Install Google Cloud CLI and log in:
curl https://sdk.cloud.google.com | bash
gcloud auth application-default login
  1. Install non-Python dependencies:
brew install hdf5 netcdf
  1. Install Python dependencies:
uv sync
  1. Run commands with UV:
uv run python ...

or

uv run jupyter lab

Deployment and Infrastructure

This repository is integrated with the Rhiza infrastructure for deployment of metrics to databases and integration of those databases into Grafana dashboards for visualization. If you are deploying this code on backend infrastructure with Grafana and Terraform, please read the Infrastructure README.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheerwater-0.2.4.tar.gz (81.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sheerwater-0.2.4-py3-none-any.whl (115.4 kB view details)

Uploaded Python 3

File details

Details for the file sheerwater-0.2.4.tar.gz.

File metadata

  • Download URL: sheerwater-0.2.4.tar.gz
  • Upload date:
  • Size: 81.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sheerwater-0.2.4.tar.gz
Algorithm Hash digest
SHA256 1646071ea2d4114e6cf86dac78dd53df8bc59865ddcd5a2baa6fb1e79e7338ec
MD5 e856e8f40a7dc7a3df79e8b55847f488
BLAKE2b-256 89415448217f0b348df30df63780f89a38274459ff7237e6af31e688661d2cbd

See more details on using hashes here.

File details

Details for the file sheerwater-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: sheerwater-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 115.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sheerwater-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 014efdb2ea7f724ab4f1cdbe2841339e16e0eab9935f2ba5c20287027718fa54
MD5 f5e4db736c145f9fca33025120417e0f
BLAKE2b-256 d688e352be9e07392a0b2136b708d5c3b0356f5f2da5cb1bcccf28cace404cac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page