A dataset plugin for climetlab for the dataset maelstrom-downscaling.
Project description
maelstrom-downscaling-ap5
A CliMetLab dataset plugin for the datasets used in application of the MAELSTROM project.
Features
This README provides a brief description of the provided datasets for statistical downscaling of
meteorological fields, the target of
application 5 (AP5) in scope of MAELSTROM.
Two different datasets, named Tier-1 and Tier-2 in the following, can be downloaded from the AWS s3-bucket,
provided by ECMWF, with this CliMetLab
plugin. Both datasets are distributed under the
Apache License, version 2.0
and thus are open-access.
Using climetlab to access the data
The CliMetLab
python package allows easy access to the data with a few lines of code.
The following examples demonstrate how to obtain the two provided datasets.
A more detailed description of both datasets is provided afterwards.
Download the Tier-1 data
The training data of the Tier-1 dataset can be downloaded as follows:
!pip install climetlab climetlab_maelstrom_downscaling
import climetlab as cml
ds = cml.load_dataset("maelstrom-downscaling", dataset="training")
ds.to_xarray()
By changing the dataset
-argument to "validation"
and "testing"
, the validation and testing data can be retrieved.
Furthermore, an augmented variant of the dataset is available which can be downloaded by adding
a _augmented
-suffix to the dataset
-arguments.
Download the Tier-2 data
The Tier-2 dataset can be downloaded by replacing the value of the first argument of cml.load_dataset
.
The following code-snippet exemplary downloads the training dataset:
ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")
Note that the training dataset comprises about 250 GB of data and thus downloading can require several minutes or hours depending on the Internet connection. Due to the comprehensive size of the dataset, no augmented variant is provided.
Saving the data on disk
By default, CliMetLab
only caches the data. To save the data persistently onto disk/in the user's filesystem,
persist=True
must be added, when running the to_xarray
-method. Furthermore, a directory-path under which the file(-s) will be saved must be parsed via data_dir
.
The following command exemplifies saving the large-scale Tier-2 dataset.
ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")
ds.to_xarray(persist=True, data_dir="/my/local/path")
Tutorial for the Tier-1 dataset
A tutorial is available in form of a Jupyter Notebook. In this Jupyter Notebook, the Tier-1 dataset is used to train a simple U-Net for downscaling adapted from [1].
Dataset description
Within the MAELSTROM project, two different datasets are provided that are
used to construct statistical downscaling tasks with deep neural networks.
The first dataset, the Tier-1 dataset, serves as the starting point and provides the data for a pure downscaling task
similar to the super-resolution task in computer vision.
The Tier-2 dataset provides the data for a real downscaling task in meteorology where the super-resolution task
is complemented by bias correction.
Both datasets will be described in more detail in the following.
The Tier-1 dataset
The Tier-1 dataset contains 2m temperature and surface elevation data obtained from the IFS HRES model at its initialization times 00 and 12 UTC between 2016 and 2020.
The data is temporally sliced to months of the summer half of the year (defined between April and September inclusively).
Spatially, the data is limited to a domain covering Central Europe including complex topography with 128x96 grid points in zonal and meridional direction.
For convenience, the data has been remapped onto a regular spherical grid with a spacing (dx) of 0.1°.
Since only one set of model data is used, the Tier-1 dataset constitutes an artificial downscaling task
where the input data is coarsened and the downscaling model is trained to revert this coarsening process.
This makes the downscaling task very similar to the super-resolution task in computer vision,
since no systematic bias has to be removed between the input and the target data. Note that this (simplified)
downscaling task has been subject to other studies on statistical downscaling with deep neural networks as well,
e.g. [1] or [2].
For the target variable of the Tier 1-dataset, the 2m temperature t2m_tar
, the coarsened input data has undergone the
following preprocessing chain:
The first step comprises a conservative remapping onto a coarse grid with dx = 0.8°. This step effectively removes fine-grained information from
the data. Second, the data is interpolated back (naively) onto the high resolved grid (with dx = 0.1°) via bi-linear interpolation. Note that this step does
not recover the information loss from step 1. Finally, to obtain energetic consistency, all calculation have been performed using the dry static energy
which is a pure function of the temperature and the elevation.
The dataset is thereby subdivided into subsets for training, validation and testing. The former comprises the data between 2016 and 2019, while the two latter are made of monthly data from 2020.
The Tier-2 dataset
The Tier-2 dataset provides data for a real downscaling task where coarse-grained ERA5 reanalysis data [3] are downscaled onto the high-resolved grid of the COSMO REA6 dataset [4]. Since data from two different models are now used, where COSMO REA6 provides added value over complex terrain due to its higher spatial resolution, the downscaling task is now twofold: The data has to be super-resolved and bias-corrected.
Here, we still target the 2m temperature as with the Tier-1 dataset, but include more predictor variables:
- 2m temperature
- temperature from model levels 137, 135, 131, 127, 122 and 115
- surface pressure
- surface latent and sensible heat fluxes
- 10m horizontal wind (u,v)
- boundary layer height
The surface topography from the ERA5 and the COSMO REA6 datasets are also included as static predictor variables.
As a necessary prerequisite, the underlying grid of both reanalysis datasets needed to be merged. Here, we have remapped the ERA5 data onto the rotated pole grid deployed by the COSMO model [5]. With a grid spacing of 0.275° in rotated coordinates, the spatial resolution of the ERA 5 data is five times coarser than the target data, the COSMO REA6-data (dx=0.055°). Similar to the Tier-1 dataset, the ERA5-data is bi-linearly interpolated onto the high resolved target grid to serve as input for the neural networks for downscaling.
Currently, the target domain of the Tier-2 dataset comprises 120x96 grid points (with dx=0.055°) covering large parts of Central Europe to include complex terrain of the Alps and the German low mountain range. Hourly data is provided for the years between 1995 and 2018. By default, the years 1995-2016 constitute the training dataset, while 2017 and 2018 are used for validation and testing, respectively.
This dataset constitutes the final dataset used in Application 5 of the MAELSTROM project as described in this report.
References
[1] Sha, Yingkai, et al. "Deep-learning-based gridded downscaling of surface meteorological variables in complex
terrain. Part I: Daily maximum and minimum 2-m temperature."
Journal of Applied Meteorology and Climatology 59.12 (2020): 2057-2073.
DOI.
[2] Leinonen, Jussi et al., "Stochastic Super-Resolution for Downscaling Time-Evolving Atmospheric Fields
With a Generative Adversarial Network." IEEE Transactions on Geoscience and Remote Sensing 59.9 (2021): 7211-7223
DOI.
[3] Hersbach, Hans, et al. "The ERA5 global reanalysis." Quarterly Journal of the Royal Meteorological Society
146.730 (2020): 1999-2049.
DOI.
[4] Bollmeyer, Christoph, et al. "Towards a high‐resolution regional reanalysis for the European CORDEX domain."
Quarterly Journal of the Royal Meteorological Society 141.686 (2015): 1-15.
DOI.
[5] Hans-Ertel-Center for Weather Research - Climate Monitoring and Diagnostics.
COSMO Regional Reanalysis - COSMO-REA6.
Link.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file climetlab_maelstrom_downscaling-0.4.0.tar.gz
.
File metadata
- Download URL: climetlab_maelstrom_downscaling-0.4.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70cc47aca9f5cbbbbbfdb112e568d610278efd094257f3fb0f37c4ead3879a8b |
|
MD5 | 88300d4f7bbbb56304c987596e2de266 |
|
BLAKE2b-256 | 21f03e7b64ce10c512aa18c9e43cd8c21685c1616ab7e66dbc2c46fe6b2a0786 |
File details
Details for the file climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 594d39f65bded44dea31d394feb9191774e4b749c9e74a345c6cf71d5d0d8605 |
|
MD5 | 90184419ebf14aaae55a33ace787aa1c |
|
BLAKE2b-256 | 10e782c7d78125cca0ca3164a10ce7492de99570463c4625311f58c68fea9d9c |