Skip to main content

Nowcasting Dataset

Project description

nowcasting_dataset

All Contributors

Pre-prepare batches of data for use in machine learning training.

This code combines several data sources including:

  • Satellite imagery (EUMETSAT SEVIRI RSS 5-minutely data of UK)
  • Numerical Weather Predictions (NWPs. UK Met Office UKV model from CEDA)
  • Solar PV power timeseries data (from PVOutput.org, downloaded using our pvoutput Python code.)
  • Estimated total solar PV generation for each of the ~350 "grid supply points" (GSPs) in Britain from Sheffield Solar's PV Live Regional API.
  • Topographic data.
  • The Sun's azimuth and angle.

This repo doesn't contain the ML models themselves. Please see this page for an overview of the Open Climate Fix solar PV nowcasting project, and how our code repositories fit together.

User manual

Installation

conda

From within the cloned nowcasting_dataset directory:

conda env create -f environment.yml
conda activate nowcasting_dataset
pip install -e .

pip

A (probably older) version is also available through pip install nowcasting-dataset

PV Live API

If you want to also install PVLive then use pip install git+https://github.com/SheffieldSolar/PV_Live-API

Pre-commit

A pre commit hook has been installed which makes black run with every commit. You need to install black and pre-commit (these will be installed by conda or pip when installing nowcasting_dataset) and run pre-commit install in this repo.

Testing

To test using the small amount of data stored in this repo: py.test -s

To test using the full dataset on Google Cloud, add the --use_cloud_data switch.

Downloading data

Satellite data

Use Satip to download native EUMETSAT SEVIRI RSS data from EUMETSAT's API and then convert to an intermediate file format.

PV data from PVOutput.org

Download PV timeseries data from PVOutput.org using our PVOutput code.

Numerical weather predictions from the UK Met Office

Request access to the UK Met Office data on CEDA.

Once you have a username and password, download using scripts/download_UK_Met_Office_NWPs_from_CEDA.sh. Please see the comments at the top of the script for instructions.

Then convert the grib files to Zarr using scripts/convert_NWP_grib_to_zarr.py. Run that script with --help to see how to operate it. See the comments at the top of the script to learn how the script works.

Detailed docs of the Met Office data is available here.

GSP-level estimates of PV outturn from PV Live Regional

TODO - GSP

Topographical data

  1. Make an account at the USGS EarthExplorer website
  2. Create a region of the world to download data for, in our case, the spatial extant of the SEVIRI RSS image
  3. Select the data products you want, in this case SRTM elevation maps
  4. Download all the SRTM files that cover that area

There does not seem to be an automated way to do this selecting and downloading, so this might take awhile.

Configure nowcasting_dataset to point to the downloaded data

Copy and modify one of the config yaml files in nowcasting_dataset/config/.

Prepare ML batches

Run scripts/prepare_ml_data.py --help to learn how to run the prepare_ml_data.py script.

What exactly is in each batch?

Please see the data_sources/<modality>/<modality>_model.py files (where <modality> is one of {datetime, metadata, gsp, nwp, pv, satellite, sun, topographic}) for documentation about the different data fields in each example / batch.

History of nowcasting_dataset

When we first started writing nowcasting_dataset, our intention was to load and align data from these three datasets on-the-fly during ML training. But it just isn't quite fast enough to keep a modern GPU constantly fed with data when loading multiple satellite channels and multiple NWP parameters. So, now, this code is used to pre-prepare thousands of batches, and save these batches to disk, each as a separate NetCDF file. These files can then be loaded super-quickly at training time. The end result is a 12x speedup in training.

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Jack Kelly

💻

Jacob Bieker

💻

Peter Dudfield

💻

Flo

💻

Rohan Nuttall

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nowcasting_dataset-2.0.32.tar.gz (65.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nowcasting_dataset-2.0.32-py3-none-any.whl (83.9 kB view details)

Uploaded Python 3

File details

Details for the file nowcasting_dataset-2.0.32.tar.gz.

File metadata

  • Download URL: nowcasting_dataset-2.0.32.tar.gz
  • Upload date:
  • Size: 65.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for nowcasting_dataset-2.0.32.tar.gz
Algorithm Hash digest
SHA256 544546399f08b1db0575d68b59fdbd93a17e761989aa5f4c41df80d1d02ed624
MD5 1ca51b4a2a84f29155adbaca66b1cdb7
BLAKE2b-256 e3c400003c176d097319e9782e609726d97cf7bf98cb12ab4600fb7c07bd0abb

See more details on using hashes here.

File details

Details for the file nowcasting_dataset-2.0.32-py3-none-any.whl.

File metadata

  • Download URL: nowcasting_dataset-2.0.32-py3-none-any.whl
  • Upload date:
  • Size: 83.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for nowcasting_dataset-2.0.32-py3-none-any.whl
Algorithm Hash digest
SHA256 67d359d9448f442f432d0b37c77ee59f32a8027bb505ea7758a4d6a9feb6ede7
MD5 703013f05211b546b748ac22416657fa
BLAKE2b-256 77d8d9f7a9895520ab328ee5fbe4c51a1b85b2a0dfe31f7f1dbe3a0d5cc7ab08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page