Skip to main content

Quality control for rainfall data

Project description

RainfallQC

RainfallQC - Quality control for rainfall data

https://img.shields.io/pypi/v/rainfallqc.svg

Provides methods for running rainfall quality control.

Installation

RainfallQC can be installed from PyPi:

pip install rainfallqc

Example use

Example 1. - Running individual checks on a single rain gauge

Let’s say you have data for a single rain gauge stored in “hourly_rain_gauge_data.csv” which looks like this:

Example data 1. Single rain gauge

time

rain_mm

2020-01-01 00:00

0.0

2020-01-01 01:00

0.1

2020-01-01 02:00

0.0

2020-01-01 03:00

105.0

2020-01-01 04:00

0.6

For the majority of the checks in RainfallQC, you can load in your data using polars and run the checks directly. Below, we run 2 example QC checks:

    1. check_intermittency - to flag years where there are periods of non-zero bounded by 0 (see Figure 1.),

    1. daily_accumulations - to flag accumulations of hourly values into daily.

https://thomasjkeel.github.io/UK-Rain-Gauge-Network/example_images/intermittency.png

Figure 1. Example of an intermittency issue within the rainfall record

import polars as pl
from rainfallqc import gauge_checks, timeseries_checks

data = pl.read_csv("hourly_rain_gauge_data.csv")

intermittent_years = gauge_checks.check_intermittency(data, target_gauge_col="rain_mm")

daily_accumulation_flags = timeseries_checks.check_daily_accumulations(
    data,
    target_gauge_col="rain_mm",
    gauge_lat=52.0,
    gauge_lon=2.0,
    smallest_measurable_rainfall_amount=0.1,
)

Please note that some checks may require additional metadata, such as gauge location (latitude and longitude) or smallest measurable rainfall amount (e.g. 0.1 mm). This could look like:

Example metadata 1. Rain gauge metadata

station_id

latitude

longitude

start_datetime

end_datetime

path

rain_mm_gauge_1

53.0

2.0

2020-01-01 00:00

2024-01-01 00:00

path/to/gauge_1.csv

rain_mm_gauge_2

54.1

-0.5

2018-01-01 00:00

2023-01-01 00:00

path/to/gauge_2.csv

rain_mm_gauge_3

56.9

1.9

2015-01-01 00:00

2025-01-01 00:00

path/to/gauge_3.csv

You could then run checks that require metadata i.e. the check_hourly_exceedance_etccdi_rx1day QC check which flags rainfall values exceeding the hourly day rainfall 1-day record at a given location (see Figure 2):

https://thomasjkeel.github.io/UK-Rain-Gauge-Network/example_images/rx1day_check.png

Figure 2. Example of an Rx1day check from the IntenseQC framework

The code for that check looks like:

import polars as pl
from rainfallqc import comparison_checks

data = pl.read_csv("hourly_rain_gauge_data_gauge_1.csv")
metadata = pl.read_csv("rain_gauge_metadata.csv")

target_gauge_id = "rain_mm_gauge_1"
target_metadata = metadata.filter(pl.col("station_id") == target_gauge_id)

rx1day_check = comparison_checks.check_hourly_exceedance_etccdi_rx1day(
     data,
     target_gauge_col=target_gauge_col,
     gauge_lat=target_metadata["latitude"],
     gauge_lon=target_metadata["longitude"]
)

Output flags will then look like:

Example flag outputs for the Rx1day QC check

time

rx1day_check

2020-01-01 00:00

0

2020-01-01 01:00

0

2020-01-01 02:00

0

2020-01-01 03:00

1

2020-01-01 04:00

0

Example 2. - Running multiple QC checks on a single target gauge

To run multiple QC checks, you can use the apply_qc_framework() method to run QC methods from a given framework (e.g. IntenseQC).

Let’s say you have hourly rainfall values from a rain gauge network data like:

Example data 2. Rain gauge network

time

rain_mm_gauge_1

rain_mm_gauge_2

rain_mm_gauge_3

2020-01-01 00:00

0.0

0.5

0.0

2020-01-01 01:00

0.5

0.0

1.0

2020-01-01 02:00

0.0

1.0

0.0

2020-01-01 03:00

105.0

0.0

0.5

2020-01-01 04:00

0.0

0.5

0.0

… and metadata like example metdata 1. You can then run multiple QC checks at once by defining a QC framework, the methods to run and parameters for those methods.

As of RainfallQC v0.3.0, there are three QC frameworks:

  1. “intenseqc” - All 25 checks from IntenseQC/GSDR-QC with names like: “QC1”, “QC2” … “QC25”,

  2. “pypwsqc” - 2 checks from pyPWSQC with the names: “FZ” and “SO”,

  3. “custom” - Allows the user to select a custom set of checks (see Example 8 in Tutorials).

Let’s run some QC checks from intenseqc framework below:

import polars as pl
from rainfallqc.qc_frameworks import apply_qc_framework

network_data = pl.read_csv("hourly_rain_gauge_network.csv")
metadata = pl.read_csv("rain_gauge_metadata.csv")

# 1. Decide which QC methods of IntenseQC will be run
qc_framework = "IntenseQC"
qc_methods_to_run = ["QC1", "QC8", "QC9", "QC10", "QC11", "QC12", "QC14", "QC15", "QC16"]

# 2. Determine nearest neighbouring gauges for neighbourhood checks
gauge_lat = gpcc_metadata["latitude"]
gauge_lon = gpcc_metadata["longitude"]
nearest_neighbourhours = ["rain_mm_gauge_2", "rain_mm_gauge_3", ...] # or see Example 3 if not determined

# 2 Decide which parameters for QC
qc_kwargs = {
    "QC1": {"quantile": 5},
    "QC14": {"wet_day_threshold": 1.0, "accumulation_multiplying_factor": 2.0},
    "QC16": {
        "list_of_nearest_stations": nearest_neighbourhours,
        "wet_threshold": 1.0,
        "min_n_neighbours": 5,
        "n_neighbours_ignored": 0,
    },
    "shared": {
        "target_gauge_col": "rain_mm_gauge_1",
        "gauge_lat": gauge_lat,
        "gauge_lon": gauge_lon,
        "time_res": "daily",
        "smallest_measurable_rainfall_amount": 0.1,
    },
}

# 3. Run QC methods on network data
qc_result = apply_qc_framework.run_qc_framework(
    daily_rain_gauge_network, qc_framework=qc_framework, qc_methods_to_run=qc_methods_to_run, qc_kwargs=qc_kwargs
)

Because lots of the checks share the same parameters with a standard vocabulary, you can use the “shared” part of the qc_kwargs dictionary to set those.

Other examples

Of course, your data may not be tabular, or may not be stored in a single file. Therefore, please see our other Tutorials.

There is also a demo notebook.

Finally, different QC methods are suitable for different temporal resolutions, see our Which checks are suitable for my data’s temporal resolution? for more information.

Documentation and License

Features

  • 27 rainfall QC methods (25 from IntenseQC, 2 from pyPWSQC)

  • polars DataFrame support for fast data processing

  • modular structure so you can pick and choose which checks to run

  • support for single gauges or networks of gauges

  • editable parameters so you can tweak thresholds, streak or accumulation lengths, and distances to neighbouring gauges

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rainfallqc-0.3.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rainfallqc-0.3.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file rainfallqc-0.3.0.tar.gz.

File metadata

  • Download URL: rainfallqc-0.3.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rainfallqc-0.3.0.tar.gz
Algorithm Hash digest
SHA256 11f09d799d31fb0dab1c4ae1c3f6410a07ca80ea7e520fa7688280cdaa7d6973
MD5 2fd7ae2bdf2a117c9608cb31004a966a
BLAKE2b-256 3c0018ad3badbc76048e5391b34e4b181b6336f432f160d5207a7b3ecf038862

See more details on using hashes here.

Provenance

The following attestation bundles were made for rainfallqc-0.3.0.tar.gz:

Publisher: publish-to-pypi.yml on NERC-CEH/RainfallQC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rainfallqc-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: rainfallqc-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rainfallqc-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 083fd36b0027aad3b03b3263e4506b36919aa809753a69f1d64943c851869cf2
MD5 b0385a1743402783faf5deb165b017d4
BLAKE2b-256 9b65d68fbe9a9fb22aadd1abf140a22b0b48c534642fe6a31c32c9414b93464c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rainfallqc-0.3.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on NERC-CEH/RainfallQC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page