Skip to main content

dataset preparation for data-driven weather models

Project description

mllam-data-prep

This package aims to be a declarative way to prepare training-data for data-driven (i.e. machine learning) weather forecasting models. A training dataset is constructed by declaring in a yaml configuration file (for example example.danra.yaml) the data sources, the variables to extract, the transformations to apply to the data, and the target variable(s) of the model architecture to map the data to.

The configuration is principally a means to represent how the dimensions of a given variable in a source dataset should be mapped to the dimensions and input variables of the model architecture to be trained.

The full configuration file specification is given in mllam_data_prep/config/spec.py.

Installation

To simply use mllam-data-prep you can install the most recent tagged version from pypi with pip:

python -m pip install mllam-data-prep

Developing mllam-data-prep

To work on developing mllam-data-prep it easiest to install and manage the dependencies with pdm. To get started clone your fork of the main repo locally:

git clone https://github.com/<your-github-username>/mllam-data-prep
cd mllam-data-prep

Use pdm to create and use a virtualenv:

pdm venv create
pdm use --venv in-project
pdm install

All the linting is handelled by pre-commit which can be setup to automatically be run on each git commit by installing the git commit hook:

pdm run pre-commit install

The branch, commit, push and make a pull-request :)

Usage

The package is designed to be used as a command-line tool. The main command is mllam-data-prep which takes a configuration file as input and outputs a training dataset in the form of a .zarr dataset named from the config file (e.g. example.danra.yaml produces example.danra.zarr).

python -m mllam_data_prep example.danra.yaml

Example output:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mllam_data_prep-0.1.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file mllam_data_prep-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mllam_data_prep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 533802a9834c4c40d9d8160895244125e80e04bbe8a40e76242d8b7f6e2b9aa6
MD5 ee27b3316cfab8ce730f188b79060455
BLAKE2b-256 e95df31faf73201f5cc6786b86f0950bd3df8a5f04046325b2f5b32dc2a9ddc7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page