Skip to main content

A set of utilities for lisflood users.define_waterregions, verify_waterregions;cutmaps: cut netCDFs;catchstats: get statistics;compare: compare netCDFs;decumulate: daily into 6h grids;gridding: interpolate meteo vars;lfcoords: finds in grids;mctrivers: MCT routing mask;nc2pcr, pcr2nc: Converts netCDF-PCRaster;ncextract: extract netCDF;download_timeseries: from WISKI API;rainbomb: Correct rainbombs;thresholds: discharge thresholds;generate_neighbours: rainbomb correction neighbours;

Project description

DOI

Lisflood Utilities

This repository hosts source code of LISFLOOD utilities. Go to Lisflood OS page for more information.

Other useful resources

Project Documentation Source code
Lisflood Model docs https://github.com/ec-jrc/lisflood-code
User guide
Lisvap Docs https://github.com/ec-jrc/lisflood-lisvap
Calibration tool Docs https://github.com/ec-jrc/lisflood-calibration
Lisflood Utilities https://github.com/ec-jrc/lisflood-utilities (this repository)
Lisflood Usecases https://github.com/ec-jrc/lisflood-usecases

Intro and documentation

Lisflood Utilities is a set of tools to help LISFLOOD users (or any users of PCRaster/NetCDF files) to execute some mundane tasks that are necessary to operate LISFLOOD.

The documentation about the available tools and how to use them can be found in the Wiki of this repository.

Here's a list of utilities you can find in lisflood-utilities package.

  • pcr2nc is a tool to convert PCRaster maps to NetCDF4 files.

    • convert a single map into a NetCDF4 file
    • convert a time series of maps into a NetCDF4 mapstack
    • support for WGS84 and ETRS89 (LAEA) reference systems
    • fine tuning of output files (compression, significant digits, etc.)
  • nc2pcr is a tool to convert a NetCDF file into PCRaster maps.

    • convert 2D variables in single PCRaster maps
    • NetCDF4 mapstacks are not supported yet
  • cutmaps is a tool to cut NetCDF files in order to reduce size, using either

    • a bounding box of coordinates
    • a bounding box of matrix indices
    • an existing boolean area mask
    • a list of stations and a LDD ("local drain direction" in NetCDF format)
  • compare is a package containing a set of simple Python classes that helps to compare NetCDF, PCRaster and TSS files.

  • thresholds is a tool to compute the discharge return period thresholds from NetCDF4 file containing a discharge time series.

  • water-demand-historic is a package allowing to generate sectoral (domestic, livestock, industry, and thermoelectric) water demand maps with monthly to yearly temporal steps for a range of past years, at the users’ defined spatial resolution and geographical extent. These maps are required by the LISFLOOD OS water use module

  • waterregions is a package containing two scripts that allow to create and verify a water regions map, respectively.

  • gridding is a tool to interpolate meteo variables observations stored in text files containing (lon, lat, value) into grids.

    • uses by default angular distance weighting (ADW) interpolation with the option to use other interpolation schemes
    • input file names should use format: <var>YYYYMMDDHHMI_YYYYMMDDHHMISS.txt but it can be changed in the configuration files
    • option to store all interpolated grids in a single NetCDF4 file
    • option to store each interpolated grid in a GeoTIFF file
    • output files are compressed
    • grids are setup in the configuration folder and are defined by a dem.nc file
    • meteo variables parameters are defined in the same configuration folder
  • decumulate is a tool that performs the decumulation of daily kiwis files into the respective 4 grids of 6hourly precipitation kiwis files in order to increase the station density or to fill in 6hourly gaps.

    • While processing the 4 grids of 6hourly precipitation Day1 12:00, 18:00 and Day2 00:00, 06:00 we will use the daily precipitation of the corresponding period meaning Day2 06:00 to decumulate its values where there is missing 6hourly precipitation.
    • For each daily precipitation observation from one of the providers above, if there is no 6h observation in a radius of 1.5 km, insert the daily value divided by 4 in all 4 6hourly datasets.
    • If there is one or more 6hourly stations in the radius and the station have less then 4 values, insert the missing values in the corresponding 6hourly dataset by and changing its value using the formula (PR - Sum(PR6)) / (number of missing values), if and only if the resulting value is positive (>=0).
    • Daily stations that have a real 6h station in the radius (1.5km) that is complete, meaning it has all four 6hourly values. The decumulation is NOT done.
    • Do NOT decumulate the MARS stations without neighbors that have in the grid, a DWDSynop station with the same WMO Number, because they are actually the same station even though they might not be exactly within the defined radius.
    • If there are more than one 6h station in the radius, select according to the following rules by order:
      1. get the one with the highest number of slots
      2. get the one with the lowest positive difference to the 24h value
      3. get the one on the top
  • download_timeseries is a tool to download timeseries of meteo variables observations from KIsters API and store them in text files in KIWI format containing the stations metadata and observations to be later used as input to generate meteo grids.

  • cddmap is a tool to generate correlation decay distance (CDD) maps starting from station timeseries

  • ncextract is a tool to extract values from NetCDF4 (or GRIB) file(s) at specific coordinates.

  • catchstats calculates catchment statistics (mean, sum, std, min, max...) from NetCDF4 files given masks created with cutmaps.

  • lfcoords is a tool that finds the appropriate coordinates in the LISFLOOD river network for any point (gauging station, dam...).

  • mctrivers creates a river mask for MCT diffusive river routing in LISFLOOD.

  • rainbomb is a tool to correct rainbomb artifacts in ERA5 daily precipitation data. A rainbomb is an unrealistically high rainfall value at a single grid point that is not supported by surrounding points. This utility identifies and corrects such artifacts by comparing each grid point against its neighbours and applying threshold-based corrections. The input file must be in the standard ERA5 resolution of approximately 31 km or 0.25 arc-degrees.

  • generate_neighbours is a tool to generate neighbour indices for each grid point. It is used to reduce the computation time for the rainbomb correction by pre-computing and storing the indices of neighbours for each grid point. The resulting file is used as a parameter for the rainbomb tool.

The package contains convenient classes for reading/writing:

  • PCRasterMap
  • PCRasterReader
  • NetCDFMap
  • NetCDFWriter

Installation

Requisites

The easy way is to use conda environment as they incapsulate C dependencies as well, so you wouldn't need to install libraries.

Otherwise, ensure you have properly installed the following software:

  • Python 3.10+
  • GDAL C library and software
  • NetCDF4 C library

Install

If you use conda, create a new env (or use an existing one) and install gdal and lisflood-utilities:

conda create --name myenv "python>=3.10" -c conda-forge
conda activate myenv
conda install -c conda-forge eccodes gdal
pip install lisflood-utilities

If you don't use conda but a straight Python3 virtualenv:

source /path/myenv/bin/activate
pip install lisflood-utilities

If GDAL library fails to install, ensure to install the same package version of the C library you have on your system. You may also need to setup paths to GDAL headers.

To check which version of GDAL libraries you have installed on your computer, use gdal-config

sudo apt-get install libgdal-dev libgdal
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
gdal-config --version  # 3.0.1
pip install GDAL==3.0.1

Note: if you previously installed an older version of lisflood-utilities, it is highly recommended to remove it before installing the newest version:

pip uninstall lisflood-utilities
pip install -e./

pcr2nc

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable pcr2nc with python bin/pcr2nc from the root folder of the cloned project.

The tool takes three command line input arguments:

  • -i, --input: It can be a path to a single file, a folder or a unix-like widlcard expression like /path/to/files/dis00*
  • -o, --output_file: Path to the output nc file
  • -m, --metadata: Path to a yaml or json file containing configuration for the NetCDF4 output file.

Unless the input is a single file, the resulting NetCDF4 file will be a mapstack according to a time dimension.

Input as a folder containing PCRaster maps. In this case, the folder must contain ONLY PCraster files and the output will be a mapstack.

pcr2nc -i /path/to/input/ -o /path/to/output/out.nc -m ./nc_metadata.yaml

Input as a path to a single map. In this case, the output won't be a mapstack.

pcr2nc -i /path/to/input/pcr.map -o /path/to/output/out.nc -m ./nc_metadata.yaml

Input as a Unix style pathname pattern expansion. The output will be a mapstack. Note that in this case the input argument must be contained in double quotes!

pcr2nc -i "/path/to/input/pcr00*" -o /path/to/output/out.nc -m ./nc_metadata.json

Writing metadata configuration file

Format of resulting NetCDF4 file is configured into a metadata configuration file. This file can be written in YAML or JSON format.

An example of a metadata configuration file is the following

variable:
  shortname: dis
  description: Discharge
  longname: discharge
  units: m3/s
  compression: 9
  least_significant_digit: 2
source: JRC Space, Security, Migration
reference: JRC Space, Security, Migration
geographical:
  datum: WGS84
  variable_x_name: lon
  variable_y_name: lat
time:
  calendar: proleptic_gregorian
  units: days since 1996-01-01

Variable section

In variable section you can configure metadata for the main variable:

  • shortname: A short name for the variable
  • longname: The long name version
  • description: A description for humans
  • units: The units of the variable
  • compression: Optional, integer number between 1 and 9, default 0 (no compression). If present the output nc file will be compressed at this level.
  • least_significant_digit: Optional, integer number, default 2. From NetCDF4 documentation:

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of 0.1 degrees), you can dramatically improve zlib compression by quantizing (or truncating) the data using the least_significant_digit keyword argument to createVariable. The least significant digit is the power of ten of the smallest decimal place in the data that is a reliable value. For example if the data has a precision of 0.1, then setting least_significant_digit=1 will cause data the data to be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Effectively, this makes the compression 'lossy' instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space.

Source and reference

source and reference add information for the institution that is providing the NetCDF4 file.

Geographical section

In the geographical section you can configure datum and name of the x and y variables. As variable_x_name and variable_y_name you should use 'lon' and 'lat' for geographical coordinates (e.g. WGS84) and 'x' and 'y' for projected coordinates (e.g. ETRS89).

Currently, pcr2nc supports the following list for datum:

  • WGS84
  • ETRS89
  • GISCO

Time section

This section is optional and is only required if the output file is a mapstack (a timeseries of georeferenced 2D arrays) In this section you have to configure units and calendar.

  • units: Can be one of the following strings (replacing placeholders with the actual date):
    • hours since YYYY-MM-DD HH:MM:SS
    • days since YYYY-MM-DD
  • calendar: A recognized calendar identifier, like proleptic_gregorian, gregorian etc.

nc2pcr

This tool converts single maps NetCDF (time dimension is not supported yet) to PCRaster format.

Usage

nc2pcr -i /path/to/input/file.nc -o /path/to/output/out.map [-c /path/to/clone.map optional]

If input file is a LDD map, you must add the -l flag:

nc2pcr -i /path/to/input/ldd.nc -o /path/to/output/ldd.map  -l [-c /path/to/clone.map optional]

cutmaps

This tool cuts NetCDF files using either a mask, a bounding box, or a list of stations along with a LDD (local drain direction) map.

Usage

The tool requires a series of arguments:

  • The area to be extracted can be defined in one of the following ways:
    • -m, --mask: a mask map (in NetCDF format).
    • -i, --cuts_indices: a bounding box defined by matrix indices in the form -i imin imax jmin jmax (the indices must be integers).
    • -c, --cuts: a bounding box defined by coordinates in the form -c xmin xmax ymin ymax (the coordinates can be integer or floating point numbers; x = longitude, y = latitude).
    • -N, -stations: a list of stations included in a tab separated text file. This approach requires a LDD (local drain direction) map as an extra input, defined with the argument -l (-ldd).
  • The files to be cut may be defined in one of the following ways:
    • -f, --folder: a folder containing NetCDF files.
    • -F, --file: a single netCDF file to be cut.
    • -S, --subdir: a directory containing a number of folders
  • The resulting files will be saved in the folder defined by the argument -o ( --outpath).

There are additional optional arguments

  • -W, --overwrite: it allows to overwrite results.
  • -C, --clonemap: it can be used to define a clone map when the LDD input map (argument -l) is in NetCDF format.

Examples

Using a mask

The following command will cut all NetCDF files inside a specific folder (argument -f) using a mask (argument -m). The mask is a boolean map (1 only in the area of interes) that cutmaps uses to create a bounding box. The resulting files will be written in the current folder (argument -o).

cutmaps -m /workarea/Madeira/maps/MaskMap/Bacia_madeira.nc -f /workarea/Madeira/lai/ -o ./

Using indices

The following command cuts all the maps in an input folder (argument -f) using a bounding box defined by matrix indices (argument -i). Knowing your area of interest from your NetCDF files, you can determine indices of the array and pass them in the form -i imin imax jmin jmax (integer numbers).

cutmaps -i "150 350 80 180" -f /workarea/Madeira/lai/ -o ./

Using coordinates

The following command cuts all the maps in an input directory containing several folders (argument -S) using a bounding box defined by coordinates (argument -c). The argument -W allows to overwrite pre-existing files in the output directory (argument -o):

cutmaps -S /home/projects/lisflood-eu -c "4078546.12 4463723.85 811206.57 1587655.50" -o /Work/Tunisia/cutmaps -W

Using station coordinates and a local drain direction map

The TXT file with stations must have a specific format as in the example below. Each row represents a stations, and it contains three columns separated by tabs that indicated the X and Y coordinates (or lon and lat) and the ID of the station.

4297500	1572500	1
4292500	1557500	2
4237500	1537500	3
4312500	1482500	4
4187500	1492500	5

The following command will cut all the maps in a specific folder (-f argument) given a LDD map (-l argument) and the previous text file (-N argument), and save the results in a folder defined by the argument -o.

cutmaps -f /home/projects/lisflood-eu -l ldd.nc -N stations.txt -o /Work/Tunisia/cutmaps

Output

Apart from the cut files in the output folder specified in the command prompt, cutmaps produces other outputs in the folder where the LDD map is stored:

  • mask.map and mask.nc for your area of interest, which may be needed in subsequent LISFLOOD/LISVAP executions.
  • outlets.map and outlets.nc based on stations.txt, which will let you produce gauges TSS if configured in LISFLOOD.

compare

This tool compares two NetCDF datasets. You can configure it with tolerances (absolute --atol, relative --rtol, thresholds for percentage of tolerated different values --max-diff-percentage). You can also set the option --save-diffs to write files with the diffences, so that you can inspect maps and differences with tools like Panoply.

usage: compare [-h] -a DATASET_A -b DATASET_B -m MASKAREA [-M SUBMASKAREA]
               [-e] [-s] [-D] [-r RTOL] [-t ATOL] [-p MAX_DIFF_PERCENTAGE]
               [-l MAX_LARGEDIFF_PERCENTAGE]

Compare NetCDF outputs: 0.12.12

optional arguments:
  -h, --help            show this help message and exit
  -a DATASET_A, --dataset_a DATASET_A
                        path to dataset version A
  -b DATASET_B, --dataset_b DATASET_B
                        path to dataset version B
  -m MASKAREA, --maskarea MASKAREA
                        path to mask
  -e, --array-equal     flag to compare files to be identical
  -s, --skip-missing    flag to skip missing files in comparison
  -D, --save-diffs      flag to save diffs in NetCDF files for visual
                        comparisons. Files are saved in ./diffs folder of
                        current directory.For each file presenting
                        differences, you will find files diffs, original A and
                        B (only for timesteps where differences are found).
  -r RTOL, --rtol RTOL  rtol
  -t ATOL, --atol ATOL  atol
  -p MAX_DIFF_PERCENTAGE, --max-diff-percentage MAX_DIFF_PERCENTAGE
                        threshold for diffs percentage
  -l MAX_LARGEDIFF_PERCENTAGE, --max-largediff-percentage MAX_LARGEDIFF_PERCENTAGE
                        threshold for large diffs percentage

thresholds

The thresholds tool computes the discharge return period thresholds using the method of L-moments. It is used to post-process the discharge from the LISFLOOD long term run. The resulting thresholds can be used in a flood forecasting system to define the flood warning levels.

Usage:

The tool takes as input a Netcdf file containing the annual maxima of the discharge signal. LISFLOOD computes time series of discharge values (average value over the selected computational time step). The users are therefore required to compute the annual maxima. As an example, this step can be achieved by using CDO (cdo yearmax), for all the details please refer to https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-190001.2.5

The output NetCDF file contains the following return period thresholds [1.5, 2, 5, 10, 20, 50, 100, 200, 500], together with the Gumbel parameters (sigma and mu).

usage: thresholds [-h] [-d DISCHARGE] [-o OUTPUT]

Utility to compute the discharge return period thresholds using the method of L-moments.
Thresholds computed: [1.5, 2, 5, 10, 20, 50, 100, 200, 500]

options:
  -h, --help            show this help message and exit
  -d DISCHARGE, --discharge DISCHARGE
                        Input discharge files (annual maxima)
  -o OUTPUT, --output OUTPUT
                        Output thresholds file

water-demand-historic

This utility allows to create water demand maps at the desired resolution and for the desired geographical areas. The maps indicate, for each pixel, the time-varying water demand map to supply for domestic, livestock, industrial, and thermoelectric water consumption. The temporal discretization is monthly for domestic and energy demand, yearly for industrial and livestock demand. The maps of sectoral water demand are required by the LISFLOOD OS water use module. Clearly, the sectoral water demand maps and the scripts of this utility can be used also for other applications, as well as for stand-alone analysis of historical water demand for anthropogenic use.

Input

The creation of the sectoral water demand maps requires a template map that defines the desired geographical area and spatial resolution. The generation of the maps relies on a number of external datasets (examples are the Global Human Settlement - Datasets - European Commission (europa.eu) and FAO AQUASTAT Dissemination System). The locations of the template map, of the input datasets and files, of the output folder, and other users’ choices (e.g. start year and end year) are specified in a configuration file. The syntax of the configuration file is pre-defined and an example is provided to the users. The complete list of external datasets, the instructions on how to prepare (i) the external dataset, (ii) the template map, (iii) the input folder, (iv) the output folder, and (v) the configuration file are explained into details here

Output

Four sectoral water demand maps in netCDF-4 format. The geographical extent and the spatial resolution are defined by the template map (users-defined input file). Each netCDF-4 file has 12 months, for each year included in the temporal time span identified by the user. Sectoral water demand data with lower (yearly) temporal resolution are repeated 12 times.

Usage

The methodology includes five main steps. The instructions on how to retrieve the scrips, create the environment including all the required packages, and use the utility are provided here

Important notes on documentation and data availability

The complete list of external datasets, the instructions on how to retrieve the external datasets, the methodology, and the usage of the scripts are explained into details here. The README file provides detailed technical information about the input datasets and the usage of this utility. The methodology is explained in the manuscript: Choulga, M., Moschini, F., Mazzetti, C., Grimaldi, S., Disperati, J., Beck, H., Salamon, P., and Prudhomme, C.: Technical note: Surface fields for global environmental modelling, EGUsphere, 2023 (preprint).

The global sectoral water demand maps at 3 arcmin (or 0.05 degrees) resolution, 1979-2019, produced using the scripts of this utility can be downloaded from Joint Research Centre Data Catalogue - LISFLOOD static and parameter maps for GloFAS - European Commission (europa.eu)

waterregions

The modelling of water abstraction for domestic, industrial, energetic, agricoltural and livestock use can require a map of the water regions. The concept of water regions and information for their definition are explained here. Since groundwater and surface water resources demand and abstraction are spatially distributed inside each water region, each model set-up must include all the pixels of the water region. This requirement is crucial for the succes of the calibration of the model. This utility allows the user to meet this requirement. More specifically, this utility can be used to:

  1. create a water region map which is consistent with a set of calibration points: this purpose is achieved by using the script define_waterregions.
  2. verify the consistency between an existing water region map and an exixting map of calibration catchments: this purpose is achieved by using the script verify_waterregions It is here reminded that when calibrating a catchment which is a subset of a larger computational domain, and the option wateruse is switched on, then the option groudwatersmooth must be switched off. The explanation of this requirement is provided in the chapter Water use of the LISFLOOD documentation.

Requirements

python3, The protocol was tested on Linux.

define_waterregions

This utility allows to create a water region map which is consistent with a set of calibration points. The protocol was created by Ad De Roo (Unit D2, Joint Research Centre).

Input

  • List of the coordinates of the calibration points. This list must be provided in a .txt file with three columns: LONGITUDE(or x), LATITUDE(or y), point ID.
  • LDD map is in NetCDF format.
  • Countries map in NetCDF format. This map shows the political boundaries of the Countries, each Coutry is identified by using a unique ID. This map is used to ensure that the water regions are not split accross different Countries.
  • Map of the initial definition of the water regions in NetCDF format. This map is used to attribute a water region to areas not included in the calibration catchments. In order to create this map, the user can follow the guidelines provided here.
  • file .yaml or .json to define the metadata of the output water regions map in NetCDF format. An example of the structure of these files is provided here
Input data provided by this utility:

This utility provides three maps of Countries IDs: 1arcmin map of Europe (EFAS computational domain), 0.1 degree and 3arcmin maps of the Globe. ACKNOWLEDGEMENTS: both the rasters were retrieved by upsampling the original of the World Borders Datase provided by http://thematicmapping.org/ (the dataset is available under a Creative Commons Attribution-Share Alike License).

Output

Map of the water regions which is consistent with the calibration catchments. In other words, each water region is entirely included in one calibration catchment. The test to check the consistency between the newly created water regions map and the calibration catchments is implemented internally by the code and the outcome of the test is printed on the screen. In the output map, each water region is identified by a unique ID. The format of the output map is NetCDF.

Usage

The following command lines allow to produce a water region map which is consistent with the calibration points (only one commad line is required: each one of the command lines below shows a different combination of input files format):

python define_waterregions.py -p calib_points_test.txt -l ldd_test.nc -C countries_id_test.nc -w waterregions_initial_test.nc -o my_new_waterregions.nc

python define_waterregions.py -p calib_points_test.txt -l ldd_test.nc -C countries_id_test.nc -w waterregions_initial_test.nc -o my_new_waterregions.nc -m metadata.test.json

python define_waterregions.py -p calib_points_test.txt -l ldd_test.nc -C countries_id_test.nc -w waterregions_initial_test.nc -o my_new_waterregions.nc -m metadata.test.yaml

The input maps are in nectdf format. It is imperative to write the file name in full, that is including the extension.
The utility returns a NetCDF file.
The metadata file in .yaml format can be provided to allow specific metadata for the output file in NetCDF format.

The code internally verifies that the each one of the newly created water regions is entirely included within one calibration catchments. If this condition is satisfied, the follwing message in printed out: “OK! Each water region is completely included inside one calibration catchment”. If the condition is not satisfied, the error message is “ERROR: The water regions WR are included in more than one calibration catchment”. Moreover, the code provides the list of the water regions WR and the calibration catchments that do not meet the requirment. This error highlight a problem in the input data: the user is recommended to check (and correct) the list of calibration points and the input maps.

The input and output arguments are listed below.

usage: define_waterregions.py [-h] -p CALIB_POINTS -l LDD -C COUNTRIES_ID -w
                              WATERREGIONS_INITIAL -o OUTPUT_WR

Define Water Regions consistent with calibration points: {}

optional arguments:
  -h, --help            show this help message and exit
  -p CALIB_POINTS, --calib_points CALIB_POINTS
                        list of calibration points: lon or x, lat or y, point id. File extension: .txt,
  -l LDD, --ldd LDD     LDD map, file extension: .nc or .map
  -C COUNTRIES_ID, --countries_id COUNTRIES_ID
                        map of Countries ID, fike extension .nc or .map 
  -w WATERREGIONS_INITIAL, --waterregions_initial WATERREGIONS_INITIAL
                        initial map of water regions, file extension: .nc or .map
  -o OUTPUT_WR, --output_wr OUTPUT_WR
                        output map of water regions, file extension: .nc or .map 
  -m METADATA, --metadata_file METADATA
                        Path to metadata file for NetCDF, .yaml or .json format                     

verify_waterregions

This function allows to verify the consistency between a water region map and a map of calibration catchments. This function must be used when the water region map and the map of calibration catchments have been defined in an independent manner (i.e. not using the utility define_waterregions). The function verify_waterregions verifies that each water region map is entirely included in one calibration catchment. If this condition is not satisfied, an error message is printed on the screen.

Input

  • Map of calibration catchments in NetCDF format.
  • Water regions map in NetCDF format.

Output

The output is a message on the screen. There are two options:

  • 'OK! Each water region is completely included inside one calibration catchment.'
  • 'ERROR: The water regions WR are included in more than one calibration catchment’: this message is followed by the list of the water regions and of the catchment that raised the isuue. In case of error message, the user can implement the function define_waterregions.

Usage

The following command line allows to produce a water region map which is consistent with the calibration points:

python verify_waterregions.py -cc calib_catchments_test.nc -wr waterregions_test.nc

The input and output arguments are listed below. All the inputs are required.

usage: verify_waterregions.py [-h] -cc CALIB_CATCHMENTS -wr WATERREGIONS

Verify that the Water Regions map is consistent with the map of the
calibration catchments

optional arguments:
  -h, --help            show this help message and exit
  -cc CALIB_CATCHMENTS, --calib_catchments CALIB_CATCHMENTS
                        map of calibration catchments, NetCDF format
  -wr WATERREGIONS, --waterregions WATERREGIONS
                        map of water regions, NetCDF format

NOTE: The utility pcr2nc can be used to convert a map in pcraster format into NetCDF format.

gridding

This tool is used to interpolate meteo variables observations stored in text files containing (lon, lat, value) into grids. It uses by default angular distance weighting interpolation method from pyg2p.

A system of filters is provided to be used to parse and filter the input Kiwis files so they get prepared for processing. Below there's the current available filters but some more can be developed for specific necessities.

Available Filters

KiwisFilter: Base class to filter Kiwis files metadata and obtain a dataframe containing only the coordinates and values to be used for interpolation. It implements the basic rules for parsing the Kiwis format.

ObservationsKiwisFilter: Class to filter Kiwis files metadata for stations that contain another station in the vicinity. Expects to have in filter_args a dictionary containing the provider ID whose stations we want to filter (as key) and the radius (in decimal degrees) to find the vicinity station from other providers (as value).

ProvidersKiwisFilter: Class to filter Kiwis files metadata for stations that belong to a list of providers and inside a defined list of time intervals. Expects to have in filter_args a dictionary containing the provider ID whose stations we want to filter (as key) and an array of pairs of start and end dates defining the intervals to filter the station from. filter_args = {1121: [('1992-01-02 06:00:00', '1993-01-01 06:00:00'), ('1995-01-02 06:00:00', '1996-01-01 06:00:00')]}

SolarRadiationLimitsKiwisFilter: Class to filter Solar Radiation Kiwis files whose data coordinates are both less equal a defined latitude and values less equal a defined threshold. This was developed to avoid wrong values of zero daily solar radiation in EFAS domain bellow 66 degrees Latitude (empirical). Expects to have in filter_args a dictionary containing the definition of the limits using the keys EXCLUDE_BELLOW_LATITUDE and EXCLUDE_BELLOW_VALUE. In case any of the exclude values are not present or empty it will use the default values of 70.0 Latitude and 0.0 Daily Solar Radiation.

Requirements

python3, pyg2p

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable gridding with python bin/gridding that is in the root folder of the cloned project.

The tool requires four mandatory command line input arguments:

  • -i, --in: Set input folder path with kiwis/point files
  • -o, --out: Set output folder base path for the tiff files or the NetCDF file path.
  • -c, --conf: Set the grid configuration type to use. Right now only 5x5km, 1arcmin are available.
  • -v, --var: Set the variable to be processed. Right now only variables pr,pd,tn,tx,ws,rg,pr6,ta6 are available.

The input folder must contain the meteo observations in text files in KIWIS format that will be parsed, filtered and the tool will generate another text file in TSV format with name pattern: <var>YYYYMMDDHHMI_YYYYMMDDHHMISS.txt These TSV files contain all the observation that will be used in the interpolation and have the columns longitude, latitude, observation_value separated by TAB and without the header. Not mandatory but could help to store the files in a folder structure like: ./YYYY/MM/DD/<var>YYYYMMDDHHMI_YYYYMMDDHHMISS.txt

Example of command that will generate a NetCDF file containing the precipitation (pr) grids for March 2023:

gridding -i /meteo/pr/2023/ -o /meteo/pr/pr_MARCH_2023.nc --pathconf /path/to/configuration/ -c 1arcmin -v pr -s 202303010600 -e 202304010600

The input and output arguments are listed below and can be seen by using the help flag.

gridding --help
usage: gridding [-h] -i /path/to/pr200102150600_all.kiwis
                [-o /path/to/pr2001.nc] -c {5x5km, 1arcmin,...}
                [-p /path/to/config] -v {pr,pd,tn,tx,ws,rg,...}
                [-d files2process.txt] [-s YYYYMMDDHHMISS] [-e YYYYMMDDHHMISS]
                [-q] [-t] [-n] [-f] [-u] [-g]
                [-m ['nearest', 'invdist', 'adw', 'cdd', 'bilinear', 'triangulation', 'bilinear_delaunay']]
                [-r ['0', '1', '2', '3', '4', '5']] [-b]

version v0.1 ($Mar 28, 2023 16:01:00$) This script interpolates meteo input
variables data into a single NETCDF4 file and, if selected, generates also a
GEOTIFF file per timestep. The resulting netCDF is CF-1.6 compliant.

optional arguments:
  -h, --help            show this help message and exit
  -i /path/to/pr200102150600_all.kiwis, --in /path/to/pr200102150600_all.kiwis
                        Set a single input kiwis file or folder path
                        containing all the kiwis files.
  -o /path/to/pr2001.nc, --out /path/to/pr2001.nc
                        Set the output netCDF file path containing all the
                        timesteps between start and end dates.
  -c {5x5km, 1arcmin,...}, --conf {5x5km, 1arcmin,...}
                        Set the grid configuration type to use.
  -p /path/to/config, --pathconf /path/to/config
                        Overrides the base path where the configurations are
                        stored.
  -v {pr,pd,tn,tx,ws,rg,...}, --var {pr,pd,tn,tx,ws,rg,...}
                        Set the variable to be processed.
  -d files2process.txt, --dates files2process.txt
                        Set file containing a list of filenames to be
                        processed in the form of
                        <var>YYYYMMDDHHMI_YYYYMMDDHHMISS.txt
  -s YYYYMMDDHHMISS, --start YYYYMMDDHHMISS
                        Set the start date and time from which data is
                        imported [default: date defining the time units inside
                        the config file]
  -e YYYYMMDDHHMISS, --end YYYYMMDDHHMISS
                        Set the end date and time until which data is imported
                        [default: 20240611060000]
  -q, --quiet           Set script output into quiet mode [default: False]
  -t, --tiff            Outputs a tiff file per timestep [default: False]
  -n, --netcdf          Outputs a single netCDF with all the timesteps
                        [default: False]
  -f, --force           Force write to existing file. TIFF files will be
                        overwritten and netCDF file will be appended.
                        [default: False]
  -u, --useexisting     Force to use existing point/txt filenames, so these
                        files will be used for gridding. [default: False]
  -g, --getexistingtiff
                        Force to use existing tiff files instead of
                        interpolating the values again. [default: False]
  -m ['nearest', 'invdist', 'adw', 'cdd', 'bilinear', 'triangulation', 'bilinear_delaunay'], --mode ['nearest', 'invdist', 'adw', 'cdd', 'bilinear', 'triangulation', 'bilinear_delaunay']
                        Set interpolation mode. [default: adw]
  -r ['0', '1', '2', '3', '4', '5'], --ramsavemode ['0', '1', '2', '3', '4', '5']
                        Set memory save mode level. Used to reduce memory
                        usage [default: 0]
  -b, --broadcast       When set, computations will run faster in full
                        broadcasting mode but require more memory. [default:
                        False]

decumulate

This tool performs the decumulation of daily KiWIS files into the respective 4 slots of 6-hourly precipitation KiWIS files to increase station density or fill 6-hourly gaps. The program is independent of the new interpolation tool, as it will be applied in the pre-processing of gridding and will be used only for historical grids, not in near-real-time (NRT) processing.

Basic functionality includes:

  1. Reading 1 daily KiWIS file and 4 corresponding 6-hourly KiWIS files
  2. Applying filters defined in the configuration files to the daily and 6-hourly files
  3. Calculating decumulated values from the daily file
  4. Writing 4 new 6-hourly KiWIS files, including decumulated values

Explanation of the algorithm

The algorithm works like it is described in the following steps:

  1. While processing the 4 grids of 6-hourly precipitation (Day 1, 12:00 and 18:00, and Day 2, 00:00 and 06:00), the script uses the daily precipitation of the corresponding period (Day 2, 06:00) to decumulate its values where there is missing 6-hourly precipitation.
  2. The script applies filters defined in the configuration files to the daily and 6-hourly files, ensuring that only values with good or suspicious quality are used.
  3. For each daily precipitation observation from one of the providers to be decumulated, if there is no 6-hourly observation within a 1.5 km radius, the daily value is divided by 4 and inserted into all 4 6-hourly datasets.
  4. If there is one or more 6-hourly stations within the radius and the station has fewer than 4 values, the missing values are inserted into the corresponding 6-hourly dataset by changing its value using the formula (PR - Sum(PR6)) / (number of missing values), but only if the resulting value is positive (≥ 0).
  5. Decumulation is not performed for daily stations that have a real 6-hourly station within the 1.5 km radius that is complete, meaning it has all four 6-hourly values.
  6. Decumulation is not performed for MARS stations without neighbors within the radius that have a DWDSynop station with the same WMO number in the grid, as they are actually the same station, even if they are not exactly within the defined radius.
  7. If there are multiple 6-hourly stations within the radius, the script selects one according to the following rules, in order: a. The station with the highest number of slots (up to 3 slots) b. The station with the lowest positive difference to the 24-hour value c. The top station

Configuration

A new property, KIWIS_FILTER_DECUMULATION_CONFIG, is required in the daily precipitation configuration file (config_pr.txt). This enables configuration of decumulation intervals, associated providers, and a radius around each station. If another complete 6-hourly station is within this radius, decumulation will be avoided.

Requirements

python3, gridding

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable decumulate with python bin/decumulate that is in the root folder of the cloned project.

The tool requires five mandatory command line input arguments:

  • -d, --pr24h: Set the input kiwis file folder containing daily precipitation.
  • -g, --pr6h": Set the input kiwis file folder containing 6 hourly precipitation.
  • -c, --conf: Set the grid configuration type to use (5x5km, 1arcmin,...).
  • -v, --var24h: Set the daily variable to be processed (pr, ta,...).
  • -6, --var6h: Set the 6hourly variable to be processed (pr6, ta6,...).

The input folder must contain the meteo observation in KIWIS files with file name format: <var>YYYYMMDDHHMI_all.kiwis The file name pattern can be changed in the configuration files. Not mandatory but could help to store the files in a folder structure like: ./YYYY/MM/DD/<var>YYYYMMDDHHMI_all.kiwis

Example of command that decumulates the daily precipitation (pr) grids for 2005-12-26 06:00 and inserts the decumulated values into the respective 6hourly files (pr6) (2005-12-25 12:00, 2005-12-25 18:00, 2005-12-26 00:00, 2005-12-26 06:00):

decumulate --pathconf /path/to/configuration/ --conf 1arcmin --var24h pr --var6h pr6 --out /path/to/output/pr6 --pr24h /path/to/input/pr/ --pr6h /path/to/input/pr6/ -s 20051226000000 -e 20051226060001

The input and output arguments are listed below and can be seen by using the help flag.

decumulate --help
usage: decumulate [-h] -d /path/to/pr -g /path/to/pr6
                  [-o /path/to/output/folder] -c {5x5km, 1arcmin,...}
                  [-p /path/to/config] -v {pr,ta,...} -6 {pr6,ta6,...}
                  [-s YYYYMMDDHHMISS] [-e YYYYMMDDHHMISS] [-b] [-q]

version v0.1 ($Mar 14, 2024 16:01:00$) This script performs the decumulation
of daily kiwis files into the respective 6hourly precipitation kiwis files in
order to increase the station density or to fill in 6hourly gaps. JIRA Issue:
EMDCC-1484

optional arguments:
  -h, --help            show this help message and exit
  -d /path/to/pr, --pr24h /path/to/pr
                        Set the input kiwis file folder containing daily
                        precipitation.
  -g /path/to/pr6, --pr6h /path/to/pr6
                        Set the input kiwis file folder containing 6 hourly
                        precipitation.
  -o /path/to/output/folder, --out /path/to/output/folder
                        Set the output folder where the resulting 6h kiwis
                        will be stored. If this folder is not set, it will
                        change the input kiwis.
  -c {5x5km, 1arcmin,...}, --conf {5x5km, 1arcmin,...}
                        Set the grid configuration type to use.
  -p /path/to/config, --pathconf /path/to/config
                        Overrides the base path where the configurations are
                        stored.
  -v {pr,ta,...}, --var24h {pr,ta,...}
                        Set the daily variable to be processed.
  -6 {pr6,ta6,...}, --var6h {pr6,ta6,...}
                        Set the 6hourly variable to be processed.
  -s YYYYMMDDHHMISS, --start YYYYMMDDHHMISS
                        Set the start date and time from which data is
                        imported [default: date defining the time units inside
                        the config file]
  -e YYYYMMDDHHMISS, --end YYYYMMDDHHMISS
                        Set the end date and time until which data is imported
                        [default: None]
  -b, --boi             Indicate that the daily timesteps are at the beginning
                        of the interval [default: False]
  -q, --quiet           Set script output into quiet mode [default: False]

IMPORTANT: The output folder will contain the same folder structure as the 6h input folder. Just a reminder that If you don't set the output folder it will overwrite the input files.

download_timeseries

This tool downloads timeseries of meteo variables observations from the WISKI API and stores them in text files in KIWI format. The downloaded data contains station metadata and observations, which can later be used as input to generate meteo grids using the gridding tool.

The download process consists of four main steps:

  1. Download metadata from the API (station information, coordinates, quality codes)
  2. Download the list of stations and filter them to keep only those in the EFAS domain
  3. Download timeseries data for each station for the specified time period
  4. Merge the timeseries data with metadata to create KIWI format files

Requirements

python3, configuration files in the gridding configuration folder

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable download_timeseries with python bin/download_timeseries that is in the root folder of the cloned project.

The tool requires two mandatory command line input arguments:

  • variable: The variable to download (e.g., pr6, ta6, pr, tx, tn, pd, ws, rg, wx)
  • -c, --conf: Set the grid configuration type to use (5x5km, 1arcmin,...)

Optional arguments:

  • -p, --pathconf: Overrides the base path where the configurations are stored
  • --base-path: Base path for output files (default: /tmp/download_timeseries)
  • --start: Start date (YYYY-MM-DD, default: 1989-12-31)
  • --end: End date (YYYY-MM-DD, default: current date)
  • --no-metadata: Skip downloading metadata
  • --no-stations: Skip downloading station list
  • --no-data: Skip downloading station data
  • --no-merge: Skip merging timeseries with metadata
  • --verbose: Enable verbose logging
  • --api-key: API key for authentication (if not provided, uses the environment variable KIWI_API_KEY)

Example of command that will download precipitation timeseries (pr6) for January 2024:

download_timeseries pr6 -c 1arcmin --start 2024-01-01 --end 2024-01-31 --api-key YOUR_API_KEY

Example of command that will download temperature timeseries (ta6) for a specific date range:

download_timeseries ta6 -c 1arcmin --start 2024-12-31 --end 2026-01-02 --base-path /path/to/output

The input and output arguments are listed below and can be seen by using the help flag:

download_timeseries --help
usage: download_timeseries.py [-h] variable -c {5x5km, 1arcmin,...}
                               [-p /path/to/config] [--base-path BASE_PATH]
                               [--start START] [--end END] [--no-metadata]
                               [--no-stations] [--no-data] [--no-merge]
                               [--verbose] [--api-key API_KEY]

Download timeseries data from WISKI API

positional arguments:
  variable              Variable to download (e.g., pr6, ta6, pr, tx, tn, pd,
                        ws, rg, wx)

optional arguments:
  -h, --help            show this help message and exit
  -c {5x5km, 1arcmin,...}, --conf {5x5km, 1arcmin,...}
                        Set the grid configuration type to use.
  -p /path/to/config, --pathconf /path/to/config
                        Overrides the base path where the configurations are
                        stored.
  --base-path BASE_PATH
                        Base path for output files (default: /tmp/download_timeseries)
  --start START         Start date (YYYY-MM-DD, default: 1989-12-31)
  --end END             End date (YYYY-MM-DD, default: current date)
  --no-metadata         Skip downloading metadata
  --no-stations         Skip downloading station list
  --no-data             Skip downloading station data
  --no-merge            Skip merging timeseries with metadata
  --verbose             Enable verbose logging
  --api-key API_KEY     API key for authentication (if not provided, uses
                        the environment variable KIWI_API_KEY)

Output

The tool creates the following output files in the specified base path:

  • {variable}_timeseries_metadata.tsv: Metadata file with station information
  • {variable}_stations_to_download.tsv: Filtered list of stations in the EFAS domain
  • timeseries/{variable}/: Folder containing individual station timeseries files
  • meteo/{variable}/: Folder containing the final KIWI format files organized by year/month/day

The KIWI format files can be used directly as input to the gridding tool for interpolation.

cddmap

This tool is used to generate correlation decay distance (CDD) maps starting from station timeseries

Requirements

python3, pyg2p

Usage

cddmap [directory]/[--analyze]/[--merge-and-filter-jsons]/--generatemap] [--start first_station] [--end last_station] [--parallel] [--only-extract-timeseries timeseries_keys_file] [--maxdistance max_distance_in_km]

The tool requires an input argument indicating the station timeseries main folder, and calculates the CDD for each stations as well as correlations and distances files. Outputs the results in a txt file containing station coordinates and CDD values. After creating the CDD txt file, it can be used with one of the following commands:

  • --analyze: read cdd file previously created for postprocessing
  • --merge-and-filter-jsons: merge all cdd files in a folder and filters out a list of stations.
  • --generatemap: generate a NetCDF CDD map file using CDD txt file and angular distance weighted interpolation between station points
  • --start and --end arguments are used to split the task in many sub tesks, evaluating only the stations between "start" and "end", since the CDD evaluation can be very time-demanding.
  • --only-extract-timeseries: in combination with path of the station's main folder, extracts the timeseries specified in the timeseries_keys_file txt list of keys
  • --parallel: enable CDD evaluation in parallel on multiple cores. It will require more memory
  • --maxdistance: evaluates only station that are clores then maxdistance in km

The input folder must contain the meteo observation in text files

Example of command that will generate txt files for the CDD of precipitation (pr), in parallel mode, for station that are closer then 500 kms:

cddmap /meteo/pr --parallel --maxdistance 500

ncextract

The ncextract tool extracts time series from (multiple) NetCDF or GRIB file(s) at user defined coordinates.

Usage

The tool takes as input a CSV file containing point coordinates and a directory containing one or more NetCDF or GRIB files. The CSV files must contain only three columns: point identifier, and its two coordinates. The name of the coordinate fields must match those in the NetCDF or GRIB files. For example:

ID,lat,lon
0010,40.6083,-4.2250
0025,37.5250,-6.2750
0033,40.5257,-6.4753

The output is a file containing the time series at the pixels corresponding to the provided coordinates, in chronological order. The function supports two otput formats: CSV or NetCDF.

usage: ncextract.py [-h] -i INPUT -d DIRECTORY -o OUTPUT [-nc]

Utility to extract time series of values from (multiple) NetCDF files at specific coordinates.
Coordinates of points of interest must be included in a CSV file with at least 3 columns named id,
lat, lon.

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input CSV file (id, lat, lon)
  -d DIRECTORY, --directory DIRECTORY
                        Input directory with .nc files
  -o OUTPUT, --output OUTPUT
                        Output file. Two extensions are supported: .csv or .nc

Use in the command prompt

The following command extracts the discharge time series from EFAS simulations (NetCDF files in the directory EFAS5/discharge/maps) in a series of points where gauging stations are located (file gauging_stations.csv), and saves the extraction as a CSV file.

ncextract -i ./gauging_stations.csv -d ./EFAS5/discharge/maps/ -o ./EFAS5/discharge/timeseries/results_ncextract.csv

Use programmatically

The function can be imported in a Python script. It takes as inputs two xarray.Dataset: one defining the input maps and the other the points of interest. The result of the extraction can be another xarray.Dataset, or saved as a file either in CSV or NetCDF format.

from lisfloodutilities.ncextract import extract_timeseries

# load desired input maps and points
# maps: xarray.Dataset
# points: xarray.Dataset

# extract time series and save in a xarray.Dataset
ds = extract_timeseries(maps, points, output=None)

catchstats

The catchstats tool calculates catchment statistics given a set of input NetCDF files and a set of mask NetCDF files.

Usage

In the command prompt

The tool takes as input a directory containing the NetCDF files from which the statistics will be computed, and another directory containing the NetCDF files that define the catchment boundaries, which can be any of the outputs of cutmaps (not necessarily the file my_mask.nc). The input files can be the LISFLOOD static maps (no temporal dimension) or stacks of maps with a temporal dimension. The mask NetCDF files must be named after the catchment ID, as this name will be used to identify the catchment in the output NetCDF. For instance, 0142.nc would correspond to the mask of catchment 142. Optionally, an extra NetCDF file can be passed to the tool to account for different pixel area; in this case, the statistics will be weighted by this pixel area map.

Only some statistics are currently available: mean, sum, std (standard deviation), var (variance), min, max, median and count. The weighing based on pixel area does not affect the statistics min, max, median nor count.

The output are NetCDF files (as many as catchments in the mask directory) containing the resulting statistics.

usage: catchstats.py [-h] -i INPUT -m MASK -s STATISTIC -o OUTPUT -a AREA [-W]

Utility to compute catchment statistics from (multiple) NetCDF files.
The mask map is a NetCDF file with values in the area of interest and NaN elsewhere.
The area map is optional and accounts for varying pixel area with latitude.

options:
  -h, --help
                          show this help message and exit
  -i INPUT, --input INPUT
                          directory containing the input NetCDF files
  -m MASK, --mask MASK
                          directory containing the mask NetCDF files
  -s STATISTIC, --statistic STATISTIC
                          list of statistics to be computed. Possible values: mean, sum, std, var, min, max, median, count
  -o OUTPUT, --output OUTPUT
                          directory where the output NetCDF files will be saved
  -a AREA, --area AREA
                          NetCDF file of pixel area used to weigh the statistics
  -W, --overwrite
                          overwrite existing output files

Example

The following command calculates the average and total precipitation for a set of catchemtns from the dataset EMO-1. The static map pixarea.nc is used to account for varying pixel area.

catchstats -i ./EMO1/pr/ -m ./masks/ -s mean sum -o ./areal_precipitation/ -a ./EFAS5/static_maps/pixarea.nc

In a Python script

The tool can be imported in a Python script to be able to save in memory the output. This function takes in a xarray.Dataset with the input maps from which statistics will be computed, a dictionary of xarray.DataArray with the catchment masks, and optionally the weighing map. By default, the result is a xarray.Dataset, but NetCDF files could be written, instead, if a directory is provided in the output attribute.

# import function
from lisfloodutilities.catchstats import catchment_statistics

# load desired input maps and catchment masks
# maps: xarray.Dataset
# masks: Dict[int, xarray.DataArray]

# compute statistics and save in a xarray.Dataset
ds = catchment_statistics(maps, masks, statistic=['mean', 'sum'], weight=None, output=None)

Ouput

The structure of the output depends on whether the input files include a temporal dimension or not:

  • If the input files DO NOT have a time dimension, the output has a single dimension: the catchment ID. It contains as many variables as the combinations of input variables and statistics. For instance, if the input variables are "elevation" and "gradient" and three statistics are required ("mean", "max", "min"), the output will contain 6 variables: "elevation_mean", "elevation_max", "elevation_min", "gradient_mean", "gradient_max" and "gradient_min".
  • If the input files DO have a time dimension, the output has two dimensions: the catchment ID and time. The number of variables follows the same structure explained in the previous point. For instance, if the input files are daily maps of precipitation (variable name "pr") and we calculate the mean and total precipitation over the catchment, the output will contain two dimensions ("ID", "time") and two variables ("pr_mean", "pr_sum").

lfcoords

This tool finds the appropriate coordinates in the LISFLOOD river network of any point, provided that the catchment area is known. A thourough explanation of the method can be found in Burek and Smilovic (2023).

First, it uses the original coordinates and catchment area to find the most accurate pixel in a high-resolution map. Burek and Smilovic (2023) use MERIT (Yamazaki et al., 2019), which has a spatial resolution of 3 arc-seconds. The result of this first step is, for every point, a new value of coordinates and area, and two shapefiles: the catchment polygons and the new points in the high-resolution grid.

Second, it finds the pixel in the low-resolution grid (LISFLOOD static maps) that better matches the catchment shape derived in the previous step. As a result, for each point we obtain a new value of coordinates and area, and two new shapefiles: the catchment polygons and the points in the low-resolution grid.

Inputs

The tool requires 5 inputs:

  • A CSV file of the stations to be located in the LISFLOOD grid. This file must contain four columns with four specific names: 'ID', 'area' in km2, 'lat', 'lon'. Below you find an example of the stations CSV file:
ID,area,lat,lon
429,35399,49.018,12.144
436,26448,48.947,12.015
439,37687,48.88,12.747
  • A map of the local drainage directions in high-resolution, e.g., MERIT.
  • A map of the upstream area in high-resolution, e.g., MERIT. The units of this map must be km2, same units as the area field in the CSV file.
  • A map of the local drainage directions in low-resolution, i.e., the LISFLOOD static map.
  • A map of the upstream area in low-resolution, i.e., the LISFLOOD static map. The units of this map are m2 (instead of km2), as these are the units used in LISFLOOD; the code converts internally this map into km2.

All maps can be provided either in TIFF or NetCDF format.

Outputs

The tool saves the outputs in the folder specified in the configuration file (output_folder). Within this folder, it will create a series of shapefiles:

  • Intermediate results:

    • A point shapefile with the original location of the points. In the example, it will be named stations.shp.
    • A point shapefile with the updated location of the points in the finer grid. In the example, it will be named stations_3sec.shp.
    • A polygon shapefile with the catchments delineated in the finer grid for each of the points. In the example, it will be named catchments_3sec.shp.
  • Final results for the LISFLOOD grid:

    • A point shapefile with the updated location of the points in the LISFLOOD grid. In the example, it will be named stations_3min.shp.
    • A polygon shapefile with the catchment delineated in the LISFLOOD grid for each of the points. In the example, it will be named catchments_3min.shp.

The attribute table of the final point layer contains 6 new columns defining the coordinates and catchment area in both the high-resolution (3sec in the example) and low-resolution grids (3min in the example). Example:

ID area area_3min area_3sec lat lat_3min lat_3sec lon lon_3min lon_3sec
429 35399 35216 35344 49.018 49.025 49.022083 12.144 12.125 12.143750
436 26448 26334 26394 48.947 48.925 48.946250 12.015 12.025 12.014583
439 37687 37540 37605 48.880 48.925 48.879583 12.747 12.675 12.746250

The tool checks for conflicts in the relocation of the points both in the finer and coarser grids. If two or more points are in the same location, or if the absolute/relative error are beyond the limits specified in the configuration file, the tool will create another shapefile (conflicts_3min.shp in the example) with only the conflicting points, so that the user can fix the issue manually.

Configuration file

The configuration file defines the input files, the folder where the resulting shapefiles will be saved, and some thresholds used in the process. A template of the configuration file can be found here. Below you find an example:

input:
    points: points.csv # ID, lat, lon, area
    ldd_fine: ldd_MERIT.tif
    upstream_fine: uparea_MERIT.tif # km2
    ldd_coarse: ldd_LISFLOOD.tif
    upstream_coarse: uparea_LISFLOOD.nc # m2
            
output_folder: ./shapefiles/

conditions:
    min_area: 200 # km2
    abs_error: 50 # km2
    pct_error: 1 # %

Usage

In the command prompt

The tool can be executed from the command prompt by indicating a configuration file.

usage: lfcoords.py [-h] -c CONFIG_FILE

Correct the coordinates of a set of stations to match the river network in the
LISFLOOD static map.
First, it uses a reference value of catchment area to find the most accurate
pixel in a high-resolution map.
Second, it finds the pixel in the low-resolution map that better matches the
catchment shape derived from the high-resolution map.

options:
  -h, --help
                          show this help message and exit
  -c CONFIG_FILE, --config-file CONFIG FILE
                          path to the YML configuration file

Examples

The tool can be executed from the command prompt by indicating a configuration file:

lfcoords --config-file config.yml

In a Python script

The functions that compose the lfcoords tool can be imported in a Python script:

# import functions
from lisfloodutilities.lfcoords import Config, read_input_files
from lisfloodutilities.lfcoords.finer_grid import coordinates_fine
from lisfloodutilities.lfcoords.coarser_grid import coordinates_coarse

# read the configuration file
cfg = Config(CONFIG_FILE)

# read the input files
# inputs = read_input_files(cfg)
# or any other option to read the inputs

# find coordinates in high-resolution (HR) grid
points_HR, polygons_HR = coordinates_fine(
    cfg,
    points,
    ldd_fine,
    upstream_fine,
    save=False
)

# find coordinates in LISFLOOD
points_LF, polygons_LF = coordinates_coarse(
    cfg,
    points_HR,
    polygons_HR,
    ldd_lisflood,
    upstream_lisflod,
    reservoirs=False,
    save=False
)

mctrivers

This tool builds a mask of mild sloping rivers for use in LISFLOOD with MCT diffusive river routing. It takes LISFLOOD channels slope map (changrad.nc), the LDD (ldd.nc), the upstream drained area map (upArea.nc) and the catchment/domain mask (mask.nc), and outputs a bolean mask (chanmct.nc). Pixels where riverbed gradient < threshold (--slope) are added to the mask if their drainage area is large enough (--minuparea) and they also have at least --nloops consecutive downstream pixels that meet the same condition for slope (drainage area will be met as downstream the area increases).

Usage

The tool requires the following mandatory input arguments:

  • -i, --changradfile: LISFLOOD channels gradient map (changrad.nc)
  • -l, --LDDfile: LISFLOOD local drain direction file (ldd.nc)
  • -u, --uparea: LISFLOOD Uustream area file (upArea.nc)

The tool can take the following additional input arguments:

  • -m, --maskfile: LISFLOOD mask or domain file (mask.nc; if not given, all domain is considered valid)
  • -S, --slope: Riverbed slope threshold to use MCT diffusive wave routing (default: 0.001)
  • -N, --nloops: Number of consecutive downstream grid cells that also need to comply with the slope requirement for including a grid cell in the MCT rivers mask (default: 5)
  • -U, --minuparea: Minimum upstream drainage area for a pixel to be included in the MCT rivers mask (uses the same units as in the -u file) (default: 0)
  • -E, --coordsnames: Coordinates names for lat, lon (in this order with space!) used in the netcdf files. The function checks for 3 commonly used names (x, lon, rlon for longitudes, and y, lat, rlat for latitudes). Therefere, it is recommended to keep the default value.

The tool generates the following outputs (when called from command line as main script):

  • -O, --outputfilename: Output file containing the rivers mask where LISFLOOD can use the MCT diffusive wave routing (default: chanmct.nc)

It can be used either from command line, or also as a python function. Below follow some examples:

Example of command that will generate an MCT rivers mask with pixels where riverbed slope < 0.001 (same as default), drainage area > 500 kms and at least 5 (same as default) downstream pixels meet the same two conditions, considering the units of the upArea.nc file are given in kms:

mctrivers -i changrad.nc -l ldd.nc -u upArea.nc -O chanmct.nc -U 500
# no data are saved when called inside python
from lisfloodutilities.mctrivers.mctrivers import mct_mask
mct_mask_ds = mct_mask(channels_slope_file='changrad.nc', ldd_file='ldd.nc', uparea_file='upArea.nc', minuparea=500)

Example of command that will generate an MCT rivers mask with pixels where riverbed slope < 0.0005, drainage area > 0 (same as default) kms and at least 3 downstream pixels meet the same two conditions. Also a mask (mask.nc) will be used, and the coords names in the nc files are "Lat1", "Lon1" for lat, lon respectively:

mctrivers -i changrad.nc -l ldd.nc -u upArea.nc -O chanmct.nc -m mask.nc -E Lat1 Lon1 -S 0.0005 -N 3 
from lisfloodutilities.mctrivers.mctrivers import mct_mask
mct_mask_ds = mct_mask(channels_slope_file='changrad.nc', ldd_file='ldd.nc', uparea_file='upArea.nc', mask_file='mask.nc', slp_threshold=0.0005, nloops=3, coords=["Lat1", "Lon1"])

rainbomb

This tool corrects rainbomb artifacts in ERA5 daily precipitation data. A rainbomb is an unrealistically high rainfall value at a single grid point that is not supported by surrounding points. This utility identifies and corrects such artifacts by comparing each grid point against its neighbours and applying threshold-based corrections.

The correction algorithm works as follows:

  1. For each grid point, the maximum precipitation value from its closest neighbours is computed
  2. If the central point's value exceeds the maximum neighbour value by more than the upper buffer threshold (c_max), the value is replaced entirely with the neighbour maximum
  3. If the excess is between the intermediate buffer (c_interm) and upper buffer (c_max), the value is linearly interpolated based on the relative difference
  4. Otherwise, the original value is kept unchanged

The buffer thresholds (c_interm and c_max) are derived from long-term SEAS5 data and vary based on the rainfall intensity bin assigned to each point.

Requirements

python3, climetlab, xarray, pandas, numpy

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable rainbomb with python bin/rainbomb that is in the root folder of the cloned project.

The tool requires the following mandatory input arguments:

  • -i, --input_file: Input file for correction (raw ERA5 in NetCDF or GRIB format)
  • -o, --output_file: Output file (corrected ERA5 in the same format as the input: NetCDF or GRIB)

The tool requires either:

  • -d, --parent_dir: Parent directory where the auxiliary data for the rainbomb correction are located

Or individual auxiliary files:

  • -n, --neighbours_file: Path to the neighbours data file (NetCDF)
  • -t, --thresholds_file: Path to the thresholds CSV file

Additional options:

  • --set-grib-date: Set the correct date in the GRIB output file based on the input NetCDF or GRIB time coordinate
  • --skip-conversion: Skip the unit conversion from m to mm (assumes input data is already in mm)
  • -p, --precipitation-variable: Name of the precipitation variable in the input dataset (defaults to 'tp')
  • -v, --verbose: Print progress information

Example of command that will correct rainbombs in an ERA5 NetCDF file:

rainbomb -i /path/to/input/era5_precip.nc -o /path/to/output/era5_precip_corrected.nc -d /path/to/auxiliary/data/

Example with GRIB input and output:

rainbomb -i /path/to/input/era5_precip.grb -o /path/to/output/era5_precip_corrected.grb -d /path/to/auxiliary/data/

Example with individual auxiliary files:

rainbomb -i /path/to/input/era5_precip.nc -o /path/to/output/era5_precip_corrected.nc -n neighbours.nc -t thresholds.csv

The input and output arguments are listed below and can be seen by using the help flag:

rainbomb --help
usage: rainbomb [-h] -i INPUT_FILE -o OUTPUT_FILE [-d PARENT_DIR]
                [-n NEIGHBOURS_FILE] [-t THRESHOLDS_FILE]
                [--set-grib-date] [--skip-conversion]
                [-p PRECIPITATION_VARIABLE] [-v]

Script for correcting ERA5 single-grid rainbombs for daily fields.

A rainbomb is an unrealistically high rainfall value at a single grid point
that is not supported by surrounding points. This utility identifies and
corrects such artifacts by comparing each grid point against its neighbours
and applying threshold-based corrections.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        Input file for correction (raw ERA5 in NetCDF or GRIB
                        format)
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output file (corrected ERA5 in same format as input:
                        NetCDF or GRIB)
  -d PARENT_DIR, --parent_dir PARENT_DIR
                        Parent directory where the auxiliary data for the
                        rainbomb correction are located. Ignored if individual
                        auxiliary files are specified.
  -n NEIGHBOURS_FILE, --neighbours_file NEIGHBOURS_FILE
                        Path to the neighbours data file (NetCDF). If not
                        provided, defaults to 'neighbours_era5_closest.nc' in
                        parent_dir.
  -t THRESHOLDS_FILE, --thresholds_file THRESHOLDS_FILE
                        Path to the thresholds CSV file. If not provided,
                        defaults to 'thresholds.csv' in parent_dir.
  --set-grib-date       Set the correct date in the GRIB output file based
                        on the input NetCDF or GRIB time coordinate
  --skip-conversion     Skip the unit conversion from m to mm (assumes input
                        data is already in mm) and skip the conversion from
                        mm back to m at the end
  -p PRECIPITATION_VARIABLE, --precipitation-variable PRECIPITATION_VARIABLE
                        Name of the precipitation variable in the input
                        dataset. If not provided, defaults to 'tp'
  -v, --verbose         Print progress information

It can also be used as a Python function:

from lisfloodutilities.rainbomb import correct_rainbomb_dataset

correct_rainbomb_dataset(
    input_file='/path/to/input/era5_precip.nc',
    output_file='/path/to/output/era5_precip_corrected.nc',
    parent_dir='/path/to/auxiliary/data/',
    verbose=True,
    set_grib_date_flag=True,
    precipitation_variable='tp'  # Optional: defaults to 'tp'
)

Recent Changes

The rainbomb tool has been updated with the following improvements:

  • NetCDF Support: The tool now supports both NetCDF and GRIB input/output formats. The output format is automatically determined from the input file extension.
  • Removed Template File: The separate GRIB template file is no longer required. When the input is GRIB, the tool uses the input file itself as the template.
  • Improved GRIB Metadata: Added proper stepRange handling for GRIB output files.
  • Skip Conversion Option: Added --skip-conversion flag for cases where input data is already in millimetres.
  • Optimized Processing: Improved rainbomb detection and correction logic using numpy operations for better performance.
  • Custom Precipitation Variable: Added -p/--precipitation-variable option to allow overriding the default precipitation variable name ('tp') in the input dataset.

generate_neighbours

This script generates neighbour indices for each grid point. It is used to reduce the computation time for the rainbomb correction by pre-computing and storing the indices of neighbours for each grid point. The neighbours on east/west are always the closest points in the same latitude. For north and south, there are two methods available:

  • Rectangle method: Keeps all points that lie within the longitudinal range of west-east neighbours
  • Closest method: Keeps only the closest north and south neighbour (can be 1 or 2 if same distance north/south-west and north/south-east of the point)

The output consists of two NetCDF files:

  • neighbours_<product>_rectangle.nc: Neighbours using the rectangle method (up to 3 north/south neighbours)
  • neighbours_<product>_closest.nc: Neighbours using the closest method (up to 2 north/south neighbours)

Requirements

python3, xarray, pandas, numpy, tqdm

Usage

Note: This guide assumes you have installed the program with pip tool. If you cloned the source code instead, just substitute the executable generate_neighbours with python bin/generate_neighbours that is in the root folder of the cloned project.

The tool requires the following arguments:

  • -d, --directory: Work directory containing the auxiliary data files and the output files. The directory should contain the sample file <product>_sample.grb depending on the product used.
  • -p, --product: Product to be used: era5, seasonal (default: era5)

Example command:

generate_neighbours -d /path/to/auxiliary/data/ -p era5

The help flag shows all available options:

generate_neighbours --help
usage: generate_neighbours [-h] -d DIRECTORY [-p PRODUCT]

Script for generating neighbour indices for each grid point.

For reducing the time of the computations for the rainbomb correction,
this script keeps the indices of the neighbours for each grid point.
Thus, the correction script can iterate directly over each point and its
neighbours. The neighbours on east/west are always the closest points in
the same latitude. For the north and south, the search is done to the next
upper/lower latitude and there are 3 methods used:
- Keep all points that lie within the longitudinal range of west-east based
  on the west & east neighbours
- Keep the closest north and south neighbour (can be 1 or even 2 if same
  distance north/south-west of the point and north/south-east)
- Keep all points that have smaller distance than the maximum distance from
  the reference point to its west/east neighbour

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Work directory containing the auxiliary data files
                        and the output files. The directory should contain
                        the sample file '<product>_sample.grb' depending on
                        the product used.
  -p PRODUCT, --product PRODUCT
                        Product to be used: era5,seasonal; the default is
                        era5

Using lisfloodutilities programmatically

You can use lisflood utilities in your python programs. As an example, the script below creates the mask map for a set of stations (stations.txt). The mask map is a boolean map with 1 and 0. 1 is used for all (and only) the pixels hydrologically connected to one of the stations. The resulting mask map is in netCDF format.

from lisfloodutilities.cutmaps.cutlib import mask_from_ldd
from lisfloodutilities.nc2pcr import convert
from lisfloodutilities.readers import PCRasterMap

ldd = 'tests/data/cutmaps/ldd_eu.nc'
stations = 'tests/data/cutmaps/stations.txt'

mask, outlets_nc, maskmap_nc = mask_from_ldd(ldd, stations)
mask_pcr = convert(mask, 'tests/data/cutmaps/mask.map', is_ldd=False)[0]
mask_map = PCRasterMap(mask_pcr)
print(mask_map.data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lisflood_utilities-1.0.1.tar.gz (220.4 kB view details)

Uploaded Source

File details

Details for the file lisflood_utilities-1.0.1.tar.gz.

File metadata

  • Download URL: lisflood_utilities-1.0.1.tar.gz
  • Upload date:
  • Size: 220.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for lisflood_utilities-1.0.1.tar.gz
Algorithm Hash digest
SHA256 13e1e40cfe4c875e5bb330928a45dac62dd6894c82d7ca7c5165cf6fcc1175b2
MD5 cd975a39714fddd1d6f60f4e6c194208
BLAKE2b-256 4bb19c20b883b9ce7eaa57978d53950c9308c265e2ed8af110313e1962e9c6b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page