Skip to main content

A dataset plugin for climetlab for the dataset maelstrom-downscaling.

Project description

maelstrom-downscaling-ap5

A CliMetLab dataset plugin for the datasets used in application of the MAELSTROM project.

Features

This README provides a brief description of the provided datasets for statistical downscaling of meteorological fields, the target of application 5 (AP5) in scope of MAELSTROM. Two different datasets, named Tier-1 and Tier-2 in the following, can be downloaded from the AWS s3-bucket, provided by ECMWF, with this CliMetLab plugin. Both datasets are distributed under the Apache License, version 2.0 and thus are open-access.

Using climetlab to access the data

The CliMetLab python package allows easy access to the data with a few lines of code.
The following examples demonstrate how to obtain the two provided datasets. A more detailed description of both datasets is provided afterwards.

Download the Tier-1 data

The training data of the Tier-1 dataset can be downloaded as follows:

!pip install climetlab climetlab_maelstrom_downscaling
import climetlab as cml
ds = cml.load_dataset("maelstrom-downscaling", dataset="training")
ds.to_xarray()

By changing the dataset-argument to "validation" and "testing", the validation and testing data can be retrieved. Furthermore, an augmented variant of the dataset is available which can be downloaded by adding a _augmented-suffix to the dataset-arguments.

Download the Tier-2 data

The Tier-2 dataset can be downloaded by replacing the value of the first argument of cml.load_dataset. The following code-snippet exemplary downloads the training dataset:

ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")

Note that the training dataset comprises about 250 GB of data and thus downloading can require several minutes or hours depending on the Internet connection. Due to the comprehensive size of the dataset, no augmented variant is provided.

Saving the data on disk

By default, CliMetLab only caches the data. To save the data persistently onto disk/in the user's filesystem, persist=True must be added, when running the to_xarray-method. Furthermore, a directory-path under which the file(-s) will be saved must be parsed via data_dir. The following command exemplifies saving the large-scale Tier-2 dataset.

ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")
ds.to_xarray(persist=True, data_dir="/my/local/path")

Tutorial for the Tier-1 dataset

A tutorial is available in form of a Jupyter Notebook. In this Jupyter Notebook, the Tier-1 dataset is used to train a simple U-Net for downscaling adapted from [1].

Dataset description

Within the MAELSTROM project, two different datasets are provided that are used to construct statistical downscaling tasks with deep neural networks. The first dataset, the Tier-1 dataset, serves as the starting point and provides the data for a pure downscaling task similar to the super-resolution task in computer vision.
The Tier-2 dataset provides the data for a real downscaling task in meteorology where the super-resolution task is complemented by bias correction.
Both datasets will be described in more detail in the following.

The Tier-1 dataset

The Tier-1 dataset contains 2m temperature and surface elevation data obtained from the IFS HRES model at its initialization times 00 and 12 UTC between 2016 and 2020. The data is temporally sliced to months of the summer half of the year (defined between April and September inclusively). Spatially, the data is limited to a domain covering Central Europe including complex topography with 128x96 grid points in zonal and meridional direction. For convenience, the data has been remapped onto a regular spherical grid with a spacing (dx) of 0.1°.

Since only one set of model data is used, the Tier-1 dataset constitutes an artificial downscaling task where the input data is coarsened and the downscaling model is trained to revert this coarsening process. This makes the downscaling task very similar to the super-resolution task in computer vision, since no systematic bias has to be removed between the input and the target data. Note that this (simplified) downscaling task has been subject to other studies on statistical downscaling with deep neural networks as well, e.g. [1] or [2].

For the target variable of the Tier 1-dataset, the 2m temperature t2m_tar, the coarsened input data has undergone the following preprocessing chain:
The first step comprises a conservative remapping onto a coarse grid with dx = 0.8°. This step effectively removes fine-grained information from the data. Second, the data is interpolated back (naively) onto the high resolved grid (with dx = 0.1°) via bi-linear interpolation. Note that this step does not recover the information loss from step 1. Finally, to obtain energetic consistency, all calculation have been performed using the dry static energy which is a pure function of the temperature and the elevation.

The dataset is thereby subdivided into subsets for training, validation and testing. The former comprises the data between 2016 and 2019, while the two latter are made of monthly data from 2020.

The Tier-2 dataset

The Tier-2 dataset provides data for a real downscaling task where coarse-grained ERA5 reanalysis data [3] are downscaled onto the high-resolved grid of the COSMO REA6 dataset [4]. Since data from two different models are now used, where COSMO REA6 provides added value over complex terrain due to its higher spatial resolution, the downscaling task is now twofold: The data has to be super-resolved and bias-corrected.

Here, we still target the 2m temperature as with the Tier-1 dataset, but include more predictor variables:

  • 2m temperature
  • temperature from model levels 137, 135, 131, 127, 122 and 115
  • surface pressure
  • surface latent and sensible heat fluxes
  • 10m horizontal wind (u,v)
  • boundary layer height

The surface topography from the ERA5 and the COSMO REA6 datasets are also included as static predictor variables.

As a necessary prerequisite, the underlying grid of both reanalysis datasets needed to be merged. Here, we have remapped the ERA5 data onto the rotated pole grid deployed by the COSMO model [5]. With a grid spacing of 0.275° in rotated coordinates, the spatial resolution of the ERA 5 data is five times coarser than the target data, the COSMO REA6-data (dx=0.055°). Similar to the Tier-1 dataset, the ERA5-data is bi-linearly interpolated onto the high resolved target grid to serve as input for the neural networks for downscaling.

Currently, the target domain of the Tier-2 dataset comprises 120x96 grid points (with dx=0.055°) covering large parts of Central Europe to include complex terrain of the Alps and the German low mountain range. Hourly data is provided for the years between 1995 and 2018. By default, the years 1995-2016 constitute the training dataset, while 2017 and 2018 are used for validation and testing, respectively.

This dataset constitutes the final dataset used in Application 5 of the MAELSTROM project as described in this report.

References

[1] Sha, Yingkai, et al. "Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature." Journal of Applied Meteorology and Climatology 59.12 (2020): 2057-2073. DOI.
[2] Leinonen, Jussi et al., "Stochastic Super-Resolution for Downscaling Time-Evolving Atmospheric Fields With a Generative Adversarial Network." IEEE Transactions on Geoscience and Remote Sensing 59.9 (2021): 7211-7223 DOI.
[3] Hersbach, Hans, et al. "The ERA5 global reanalysis." Quarterly Journal of the Royal Meteorological Society 146.730 (2020): 1999-2049. DOI.
[4] Bollmeyer, Christoph, et al. "Towards a high‐resolution regional reanalysis for the European CORDEX domain." Quarterly Journal of the Royal Meteorological Society 141.686 (2015): 1-15. DOI.
[5] Hans-Ertel-Center for Weather Research - Climate Monitoring and Diagnostics. COSMO Regional Reanalysis - COSMO-REA6. Link.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

climetlab_maelstrom_downscaling-0.4.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file climetlab_maelstrom_downscaling-0.4.0.tar.gz.

File metadata

File hashes

Hashes for climetlab_maelstrom_downscaling-0.4.0.tar.gz
Algorithm Hash digest
SHA256 70cc47aca9f5cbbbbbfdb112e568d610278efd094257f3fb0f37c4ead3879a8b
MD5 88300d4f7bbbb56304c987596e2de266
BLAKE2b-256 21f03e7b64ce10c512aa18c9e43cd8c21685c1616ab7e66dbc2c46fe6b2a0786

See more details on using hashes here.

File details

Details for the file climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 594d39f65bded44dea31d394feb9191774e4b749c9e74a345c6cf71d5d0d8605
MD5 90184419ebf14aaae55a33ace787aa1c
BLAKE2b-256 10e782c7d78125cca0ca3164a10ce7492de99570463c4625311f58c68fea9d9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page