Skip to main content

PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep Learning.

Project description

PeakWeather

This repository contains the code to load and preprocess the PeakWeather dataset. The dataset is hosted on Hugging Face

https://huggingface.co/datasets/MeteoSwiss/PeakWeather

and presented in the paper

PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep Learning
Daniele Zambon², Michele Cattaneo¹, Ivan Marisca², Jonas Bhend¹, Daniele Nerini¹, Cesare Alippi² ³
¹ MeteoSwiss, ² USI, IDSIA, ³ PoliMi

Refer to peakweather.readthedocs.io for the documentation.

Stations Station graph

Quickstart:

Install

Option 1: Clone and install locally

git clone https://github.com/MeteoSwiss/PeakWeather.git 
cd PeakWeather
pip install .                # Without extras
pip install .[topography]    # Install with extras

Option 2: Install via pip as a package

pip install git+https://github.com/MeteoSwiss/PeakWeather.git # Install normal package
pip install "peakweather[topography] @ git+https://github.com/MeteoSwiss/PeakWeather@main" # Install with extras

Download the data from Hugging Face

from peakweather.dataset import PeakWeatherDataset
# Download the data in the current working directory
ds = PeakWeatherDataset(root=None)

Load pre-downloaded data

from peakweather.dataset import PeakWeatherDataset
ds = PeakWeatherDataset(root=<PATH_TO_DATA>)

Get observations

# For a single station, all parameters
ds.get_observations(stations='KLO') 
# For two stations, all parameters
ds.get_observations(stations=['KLO', 'GRO']) 
# For specific parameters
ds.get_observations(stations='KLO', parameters=['pressure', 'temperature']) 
datetime ('KLO', 'pressure') ('KLO', 'temperature')
2017-01-01 00:00:00+00:00 977.8 -3.3
2017-01-01 00:10:00+00:00 977.7 -3.5
2017-01-01 00:20:00+00:00 977.6 -3.5
2017-01-01 00:30:00+00:00 977.5 -3.6
2017-01-01 00:40:00+00:00 977.3 -3.5
...
# Get observations for a specific time frame
ds.get_observations(stations='KLO', 
                    parameters=['wind_speed', 'wind_direction'], 
                    first_date='2024-08-01 16:32',
                    last_date='2024-08-01 17:26')
datetime ('KLO', 'wind_speed') ('KLO', 'wind_direction')
2024-08-01 16:40:00+00:00 3.9 219
2024-08-01 16:50:00+00:00 2.5 225
2024-08-01 17:00:00+00:00 2.9 231
2024-08-01 17:10:00+00:00 3.1 259
2024-08-01 17:20:00+00:00 2.8 237

Detailed Usage

For detailed usage and parameter descriptions, please refer to the docstring of the PeakWeatherDataset class, which provides extended documentation on its functionality and options.

Re-sampling

ds = PeakWeatherDataset(
        root="data",  # Path to the dataset
        pad_missing_values=True,  # Pad missing values with NaN
        years=None,  # Years to include in the dataset (None for all)
        parameters=None,  # Parameters to include in the dataset (None for all)
        extended_topo_vars="none",  # Optional extended topographic variables
        extended_nwp_pars="none",  # Optional extended NWP model (ICON) variables
        imputation_method="zero",  # Method for imputing missing values
        freq="h",  # Frequency of the data (e.g., "h" for hourly)
        compute_uv=True,  # Compute u and v components of wind
        station_type="meteo_station",  # Which station type to load (None for all)
        aggregation_methods={'temperature': 'mean'} # Use specific aggregation
    )

ds.parameters_table["aggregation"]

The above dataset is initialized with hourly frequency. The 10-minute values are aggregated with the default methods below:

name aggregation
humidity last
precipitation sum
pressure last
sunshine sum
temperature last
wind_direction circ_mean
wind_gust max
wind_speed mean
wind_u mean
wind_v mean

Notice, however, how we can change the aggregation method with the aggregation_methods argument. In this case, the temperature will be averaged over the previous hour.

Basic information

We can obtain some basic information about the content of the dataset as follows:

# Get printable representation of the dataset
print(ds)

# Show dataset information
print(f"Number of time steps: {ds.num_time_steps}")

print(f"Number of stations: {ds.num_stations}")
print(ds.stations_table.head(10))

print(f"Number of parameters: {ds.num_parameters}")
print(f"Parameters")
ds.show_parameters_description()

# Show data
print(f"Observations shape: {ds.observations.shape}")
print(ds.observations.head(10))

# Show the amount of missing values considering stations 
# equipped with the respective sensor
print(ds.missing_values)

We can get observations for a specific station and parameter as arrays:

# Get wind gust and direction for station KLO
klo_data = ds.get_observations(stations="KLO",
                               parameters=["wind_gust", "wind_direction"],
                               as_numpy=True)

print(f"KLO data shape: {klo_data.shape}")
print(f"KLO maximum wind gust: {klo_data[..., 0].max():.2f} m/s")

Time series windowing

We can obtain the data for a sliding window of a size $W$ and horizon $H$.

window_size = 12
lead_times = 3
sub_windows = ds.get_windows(window_size=window_size,
                             horizon_size=lead_times,
                             stations=ds.stations[:10],
                             parameters=["wind_speed", "wind_direction"],
                             first_date="2020-01-01",
                             last_date="2022-01-01")
print(f"Windows x shape: {sub_windows.x.shape}")
print(f"Windows mask_x shape: {sub_windows.mask_x.shape}")
print(f"Windows y shape: {sub_windows.y.shape}")
print(f"Windows mask_y shape: {sub_windows.mask_y.shape}")

The object returned contains x of shape $[\text{windows}, W,\text{stations}, \text{params}]$ and mask_x of the same shape, representing the input windows. Associated with them, there are y and mask_y of shape $[\text{windows}, H,\text{stations}, \text{params}]$ representing the future quantities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peakweather-0.2.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peakweather-0.2.1-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file peakweather-0.2.1.tar.gz.

File metadata

  • Download URL: peakweather-0.2.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for peakweather-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e9932b5b59b7e07f536f83f28e26f7881bdb7529f2831bb1eacc73c558b183d0
MD5 419e0fce1ac3132ef6c6e855dfca05ed
BLAKE2b-256 30ca0b96e631bf02d3e81854e7c0d34b3226ff303b8ce944072a0a0eaaec1a35

See more details on using hashes here.

Provenance

The following attestation bundles were made for peakweather-0.2.1.tar.gz:

Publisher: CI_publish.yaml on MeteoSwiss/PeakWeather

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file peakweather-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: peakweather-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for peakweather-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5823950eb31afeef8c1af16db72fe787869f7e23335fa13af9bc93deae2726c2
MD5 d7c60052a4fd6c8d5b5ceeab321128c1
BLAKE2b-256 fb8dc1ab777d3e5df8d1956eb7f31a1f236d608989ff77704c153a37359ea72d

See more details on using hashes here.

Provenance

The following attestation bundles were made for peakweather-0.2.1-py3-none-any.whl:

Publisher: CI_publish.yaml on MeteoSwiss/PeakWeather

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page