Skip to main content

Benchmark dataset for learning dynamical systems from data

Project description

Dynabench: A benchmark dataset for learning dynamical systems from low-resolution data

This is the repository containing the data generation algorithms as well as all baseline models for the Dynabench: A benchmark dataset for learning dynamical systems from low-resolution data paper (accepted at ECML-PKDD 2023)

DynaBench is a benchmark dataset for learning dynamical systems from data. Dynamical systems are physical systems that are typically modelled by partial differential equations (e.g. numerical weather prediction, climate models, fluid simulation, electromagnetic field simulation etc.). The main challenge of learning to predict the evolution of these systems from data is the chaotic behaviour that these systems show (small deviation from the initial conditions leads to highly different predictions) as well as data availability. In real world settings only low-resolution data is available, with measurements sparsly scattered in the simulation domain (see following figure illustrating the distribution of weather monitoring stations in europe).

Weather stations europe gif

In this benchmark we try to simulate this setting using synthetic data for easier evaluation and training of different machine learning models. To this end we generated simulation data by solving five different PDE systems which were then postprocessed to create low-resolution snapshots of the simulation.

There main tasks for which the dataset has been generated is forecasting - predicting the next state(s) of the system

The six included different equations were selected to be both sufficiently complex, as well as sufficiently variable to simulate different physical systems (first and second order, coupled equations, stationary and non-statinary).

An example (wave equation) of a simulated system is shown below:

Wave example gif

Equations

There are four different equations in the dataset, each with different characteristics summarized in the following table:

Equation Components Time Order Spatial Order
Advection 1 1 1
Burgers' 2 1 2
Gas Dynamics 4 1 2
Kuramoto-Sivashinsky 1 1 4
Reaction-Diffusion 2 1 2
Wave 1 2 2

Setup

Automated setup

If needed create a virtual environment and activate it You can then install all dependencies by running

sh scripts/install_requirements.sh

from the main project directory

Manual installation

Alternatively you can manually install the dependencies from the requirements.txt file:

pip install -r requirements.txt

It is recommended to first create a virtual environment, for example:

python -m venv venv
source venv/bin/activate

Additionally you need to install pytorch geometric, following the instructions on their website.

Generation

To generate the data for a specific equation run

python generate.py num_simulations=NUM_SIMULATIONS equation=EQUATION split=DATASET_SPLIT

Where NUM_SIMULATIONS indicates how many times each equation is simulated,EQUATION is one of (advection, burgers, gas_dynamics, kuramoto_sivashinsky, reaction_diffusion, wave), and DATASET_SPLIT is one of train, test, val

The full benchmark dataset contains 7000 simulations for the training set, 1000 for the validation set and 1000 for the test set, all divided into chunks of 500 simulations.

Warning: this can take a long time.

Data format

The data is stored in .tar archives in chunks of 500 simulations. Each simulation consists of one file called XXXXXXX.data containing the simulation values for the given setting (cloud/grid + number of points) as well as a file called XXXXXXX.data containing the coordinates of the points at which the measurements were recorded.

Usage

To reproduce the experiments from our paper run: python main.py equation=EQUATION model=MODEL support=cloud num_points=NUM_POINTS

This will start the training for a specific setting. The parameters specify which model, task, support structure, number of points etc. should be run. The available choices of parameters are:

EQUATION = [brusselator, gas_dynamics, kuramoto_sivashinsky, wave, advection]

MODEL = [persistence, point_gnn, point_net, point_transformer, gat, gcn, feast, kernelNN, graphpde]

To run the experiments for the grid models run: python main.py equation=EQUATION model=MODEL support=grid num_points=NUM_POINTS datamodule=torch" lightningmodule=gridmodule support=grid

with MODEL selected from [neuralpde, resnet, cnn]

Additionally, to use the benchmark for your own research use the included datasets.The repository contains two dataset classes to handle the generated data.

  1. A pytorch dataset class, where each sample has the form $X\in\mathbb{R}^{L\times N\times D}$, where L is the lookback, N is the number of points and D is the number of target variables. See documentation of the dataset for details. To initialize the dataset class:
DynaBenchBase(
    mode: str = 'train',
    equation: str = 'gas_dynamics',
    task: str = 'forecast',
    support: str = 'grid',
    num_points: str = 'high',
    base_path: str = 'data',
    lookback: int = 1,
    rollout: int = 1,
    test_ratio: float = 0.1,
    val_ratio: float = 0.1,
    merge_lookback: bool = True,
    *args,
    **kwargs
)

Initializes a pytorch dataset with selected parameters. The data is loaded lazily.

Args:

  • mode (str, optional): the selection of data to use (train/val/test). Defaults to "train".
  • equation (str, optional): the equation to use. Defaults to "gas_dynamics".
  • task (str, optional): Which task to use as targets. Defaults to "forecast".
  • support (str, optional): Structure of the points at which the measurements are recorded. Defaults to "grid".
  • num_points (str, optional): Number of points at which measurements are available. Defaults to "high".
  • base_path (str, optional): location where the data is stored. Defaults to "data".
  • lookback (int, optional): How many past states are used to make the prediction. The additional states can be concatenated along the channel dimension if merge_lookback is set to True. Defaults to 1.
  • rollout (int, optional): How many steps should be predicted in a closed loop setting. Only used for forecast task. Defaults to 1.
  • test_ratio (float, optional): What fraction of simulations to set aside for testing. Defaults to 0.1.
  • val_ratio (float, optional): What fraction of simulations to set aside for validation. Defaults to 0.1.
  • merge_lookback (bool, optional): Whether to merge the additional lookback information into the channel dimension. Defaults to True.
  1. A graph dataset, specifically used for Message Passing Neural Networks implemented using the Pytorch Geometric module. It has a similar structure as the base DynaBench dataset.

Benchmark Results

The following tables show the results of our experiments

  • forecast task, 900 points (1-step MSE):
model Advection Burgers Gas Dynamics Kuramoto-Sivashinsky Reaction-Diffusion Wave
CNN 5.30848e-05 0.0110988 0.00420368 0.000669837 0.00036918 0.00143387
FeaSt 0.000130351 0.0116155 0.0162 0.0117867 0.000488848 0.00523298
GAT 0.00960113 0.0439986 0.037483 0.0667057 0.00915208 0.0151498
GCN 0.026397 0.13899 0.0842611 0.436563 0.164678 0.0382004
GraphPDE 0.000137098 0.0107391 0.0194755 0.00719822 0.000142114 0.00207144
KernelNN 6.31157e-05 0.0106146 0.013354 0.00668698 0.000187019 0.00542925
NeuralPDE 8.24453e-07 0.0112373 0.00373416 0.000536958 0.000303176 0.00169871
Persistence 0.0812081 0.0367688 0.186985 0.142243 0.147124 0.113805
Point Transformer 4.41633e-05 0.0103098 0.00724899 0.00489711 0.000141248 0.00238447
PointGNN 2.82496e-05 0.00882528 0.00901649 0.00673036 0.000136059 0.00138772
ResNet 2.15721e-06 0.0148052 0.00321235 0.000490104 0.000156752 0.00145884
  • forecast task, 900 points (16-step rollout MSE):
model Advection Burgers Gas Dynamics Kuramoto-Sivashinsky Reaction-Diffusion Wave
CNN 0.00161331 0.554554 0.995382 1.26011 0.0183483 0.561433
FeaSt 1.48288 0.561197 0.819594 3.74448 0.130149 1.61066
GAT 41364.1 0.833353 1.21436 5.68925 3.85506 2.38418
GCN 3.51453e+13 13.0876 7.20633 1.70612e+24 1.75955e+07 7.89253
GraphPDE 1.07953 0.729879 0.969208 2.1044 0.0800235 1.02586
KernelNN 0.897431 0.72716 0.854015 2.00334 0.0635278 1.57885
NeuralPDE 0.000270308 0.659789 0.443498 1.05564 0.0224155 0.247704
Persistence 2.39393 0.679261 1.457 1.89752 0.275678 2.61281
Point Transformer 0.617025 0.503865 0.642879 2.09746 0.0564399 1.27343
PointGNN 0.660665 1.04342 0.759257 2.82063 0.0582293 1.30743
ResNet 8.64621e-05 1.86352 0.480284 1.0697 0.00704612 0.299457

License

The content of this project itself, including the data and pretrained models, is licensed under the Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0). The underlying source code used to generate the data and train the models is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynabench-0.2.1.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynabench-0.2.1-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file dynabench-0.2.1.tar.gz.

File metadata

  • Download URL: dynabench-0.2.1.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for dynabench-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e56e6f99d46b22d851b61bf893c2e27c8597fb03e405f308b72e86badcd26bdc
MD5 882dc3c291b77ebf87789871f8805313
BLAKE2b-256 286bc000d2287d458546e695c2b7e01ec1f53cd607713c2589d5e9392b7d1554

See more details on using hashes here.

File details

Details for the file dynabench-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dynabench-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for dynabench-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4aa4dc04a648b6c7db6d9cf15c481b55a636e9389e937bf7536c53f22504f755
MD5 6b4d9986cedb07f664b552829d45515f
BLAKE2b-256 87ba832e66c962b974df79883c46b79261764f20b711d02f75b59bbe8d5f5b07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page