Skip to main content

Quality control for rainfall data

Project description

RainfallQC

RainfallQC - Quality control for rainfall data

https://img.shields.io/pypi/v/rainfallqc.svg

Provides methods for running rainfall quality control.

Installation

RainfallQC can be installed from PyPi:

pip install rainfallqc

Example use

Example 1. - Running individual checks on a single rain gauge

Let’s say you have data for a single rain gauge stored in “hourly_rain_gauge_data.csv” which looks like this:

Example data 1. Single rain gauge

time

rain_mm

2020-01-01 00:00

0.0

2020-01-01 01:00

0.1

2020-01-01 02:00

0.0

2020-01-01 03:00

105.0

2020-01-01 04:00

0.6

For the majority of the checks in RainfallQC, you can load in your data using polars and run the checks directly. Below, we run 2 example QC checks:

    1. check_intermittency - to flag years where there are periods of non-zero bounded by 0 (see Figure 1.),

    1. daily_accumulations - to flag accumulations of hourly values into daily.

https://thomasjkeel.github.io/UK-Rain-Gauge-Network/example_images/intermittency.png

Figure 1. Example of an intermittency issue within the rainfall record

import polars as pl
from rainfallqc import gauge_checks, timeseries_checks

data = pl.read_csv("hourly_rain_gauge_data.csv")

intermittent_years = gauge_checks.check_intermittency(data, target_gauge_col="rain_mm")

daily_accumulation_flags = timeseries_checks.check_daily_accumulations(
    data,
    target_gauge_col="rain_mm",
    gauge_lat=52.0,
    gauge_lon=2.0,
    smallest_measurable_rainfall_amount=0.1,
)

Please note that some checks may require additional metadata, such as gauge location (latitude and longitude) or smallest measurable rainfall amount (e.g. 0.1 mm). This could look like:

Example metadata 1. Rain gauge metadata

station_id

latitude

longitude

start_datetime

end_datetime

path

rain_mm_gauge_1

53.0

2.0

2020-01-01 00:00

2024-01-01 00:00

path/to/gauge_1.csv

rain_mm_gauge_2

54.1

-0.5

2018-01-01 00:00

2023-01-01 00:00

path/to/gauge_2.csv

rain_mm_gauge_3

56.9

1.9

2015-01-01 00:00

2025-01-01 00:00

path/to/gauge_3.csv

You could then run checks that require metadata i.e. the check_hourly_exceedance_etccdi_rx1day QC check which flags rainfall values exceeding the hourly day rainfall 1-day record at a given location (see Figure 2):

https://thomasjkeel.github.io/UK-Rain-Gauge-Network/example_images/rx1day_check.png

Figure 2. Example of an Rx1day check from the IntenseQC framework

The code for that check looks like:

import polars as pl
from rainfallqc import comparison_checks

data = pl.read_csv("hourly_rain_gauge_data_gauge_1.csv")
metadata = pl.read_csv("rain_gauge_metadata.csv")

target_gauge_id = "rain_mm_gauge_1"
target_metadata = metadata.filter(pl.col("station_id") == target_gauge_id)

rx1day_check = comparison_checks.check_hourly_exceedance_etccdi_rx1day(
     data,
     target_gauge_col=target_gauge_col,
     gauge_lat=target_metadata["latitude"],
     gauge_lon=target_metadata["longitude"]
)

Output flags will then look like:

Example flag outputs for the Rx1day QC check

time

rx1day_check

2020-01-01 00:00

0

2020-01-01 01:00

0

2020-01-01 02:00

0

2020-01-01 03:00

1

2020-01-01 04:00

0

Example 2. - Running multiple QC checks on a single target gauge

To run multiple QC checks, you can use the apply_qc_framework() method to run QC methods from a given framework (e.g. IntenseQC).

Let’s say you have hourly rainfall values from a rain gauge network data like:

Example data 2. Rain gauge network

time

rain_mm_gauge_1

rain_mm_gauge_2

rain_mm_gauge_3

2020-01-01 00:00

0.0

0.5

0.0

2020-01-01 01:00

0.5

0.0

1.0

2020-01-01 02:00

0.0

1.0

0.0

2020-01-01 03:00

105.0

0.0

0.5

2020-01-01 04:00

0.0

0.5

0.0

… and metadata like example metdata 1. You can then run multiple QC checks at once by defining a QC framework, the methods to run and parameters for those methods.

As of RainfallQC v0.3.0, there are three QC frameworks:

  1. “intenseqc” - All 25 checks from IntenseQC/GSDR-QC with names like: “QC1”, “QC2” … “QC25”,

  2. “pypwsqc” - 2 checks from pyPWSQC with the names: “FZ” and “SO”,

  3. “custom” - Allows the user to select a custom set of checks (see Example 8 in Tutorials).

Let’s run some QC checks from intenseqc framework below:

import polars as pl
from rainfallqc.qc_frameworks import apply_qc_framework

network_data = pl.read_csv("hourly_rain_gauge_network.csv")
metadata = pl.read_csv("rain_gauge_metadata.csv")

# 1. Decide which QC methods of IntenseQC will be run
qc_framework = "IntenseQC"
qc_methods_to_run = ["QC1", "QC8", "QC9", "QC10", "QC11", "QC12", "QC14", "QC15", "QC16"]

# 2. Determine nearest neighbouring gauges for neighbourhood checks
gauge_lat = gpcc_metadata["latitude"]
gauge_lon = gpcc_metadata["longitude"]
nearest_neighbourhours = ["rain_mm_gauge_2", "rain_mm_gauge_3", ...] # or see Example 3 if not determined

# 2 Decide which parameters for QC
qc_kwargs = {
    "QC1": {"quantile": 5},
    "QC14": {"wet_day_threshold": 1.0, "accumulation_multiplying_factor": 2.0},
    "QC16": {
        "list_of_nearest_stations": nearest_neighbourhours,
        "wet_threshold": 1.0,
        "min_n_neighbours": 5,
        "n_neighbours_ignored": 0,
    },
    "shared": {
        "target_gauge_col": "rain_mm_gauge_1",
        "gauge_lat": gauge_lat,
        "gauge_lon": gauge_lon,
        "time_res": "daily",
        "smallest_measurable_rainfall_amount": 0.1,
    },
}

# 3. Run QC methods on network data
qc_result = apply_qc_framework.run_qc_framework(
    daily_rain_gauge_network, qc_framework=qc_framework, qc_methods_to_run=qc_methods_to_run, qc_kwargs=qc_kwargs
)

Because lots of the checks share the same parameters with a standard vocabulary, you can use the “shared” part of the qc_kwargs dictionary to set those.

Other examples

Of course, your data may not be tabular, or may not be stored in a single file. Therefore, please see our other Tutorials.

There is also a demo notebook.

Finally, different QC methods are suitable for different temporal resolutions, see our Which checks are suitable for my data’s temporal resolution? for more information.

Documentation and License

Features

  • 27 rainfall QC methods (25 from IntenseQC, 2 from pyPWSQC)

  • polars DataFrame support for fast data processing

  • modular structure so you can pick and choose which checks to run

  • support for single gauges or networks of gauges

  • editable parameters so you can tweak thresholds, streak or accumulation lengths, and distances to neighbouring gauges

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rainfallqc-0.3.1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rainfallqc-0.3.1-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file rainfallqc-0.3.1.tar.gz.

File metadata

  • Download URL: rainfallqc-0.3.1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rainfallqc-0.3.1.tar.gz
Algorithm Hash digest
SHA256 4e1967a2c0456a52c4eb6e7c5ecc9640821f3268c862c74a54716a9b0f11181c
MD5 2fdbe818fe874229a7c1b5ed0b10c51c
BLAKE2b-256 930ad15d234e0763f8cee93fb74b8646a04f5ab9e8744aaf5d847c8e21ead9d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for rainfallqc-0.3.1.tar.gz:

Publisher: publish-to-pypi.yml on NERC-CEH/RainfallQC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rainfallqc-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: rainfallqc-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rainfallqc-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3aa5bedda1d0cb343a537eeb9b6404b12e28c8bdea76e268830d32f39213db0
MD5 765553a7fc2bf4e755d44ccc219f4eaa
BLAKE2b-256 4af83234acd4c531e88604d979279669f32916d7e4c784414bc53c7933f25532

See more details on using hashes here.

Provenance

The following attestation bundles were made for rainfallqc-0.3.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on NERC-CEH/RainfallQC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page