Skip to main content

No project description provided

Project description

ocf-data-sampler

All Contributors

tags badge ease of contribution: easy

ocf-data-sampler contains all the tools needed to create samples and feed them to our models, such as PVNet. The data we work with—typically energy data, satellite imagery, and numerical weather predictions (NWPs)—is usually too heavy to do this on the fly, so that's where this repo comes in: handling steps like opening the data, selecting the right samples, normalising and reshaping, and saving to and reading from disk.

We are currently migrating to this repo from ocf_datapipes, which performs the same functions but is built around PyTorch DataPipes, which are quite cumbersome to work with and are no longer maintained by PyTorch. ocf-data-sampler uses PyTorch Datasets, and we've taken the opportunity to make the code much cleaner and more manageable.

[!Note] This repository is still in early development development and large changes to the user facing functions may still occur.

Licence

This project is primarily licensed under the MIT License (see LICENSE).

It includes and adapts internal functions from the Google xarray-tensorstore project, licensed under the Apache License, Version 2.0.

Documentation

ocf-data-sampler doesn't have external documentation yet; you can read a bit about how our torch datasets work in the README here.

FAQ

If you have any questions about this or any other of our repos, don't hesitate to hop to our Discussions Page!

How does ocf-data-sampler deal with data sources that use different projections (e.g. some are in latitude-longitude, and some in OSGB)?

When creating samples, we make a spatial crop of a preset size centred around a point of interest (POI, usually a solar or wind farm). The size of the crop is set not in miles or kilometres, but in 'pixels', which would be different for different data sources, depending on their spatial resolution, projections they use, and where the POI is. For example, a latitude-longitude source with a 1° resolution will have pixel sizes corresponding to very different 'surface' distances (that you might measure in, e.g., kilometres) from a source with 0.1° resolution. The pixel size will even be different for the same source depending on how close the POI is to the equator!

Instead of trying to accommodate for all these differences and make all the sources use the same spatial grid, we translate the POI's position into the corresponding coordinate system and select the crop using the source's original grid. This 'snapshot' is then passed to the model with no additional information on what specific coordinates it represents; instead, since the size is always the same and the POI is always in the centre, the model gets consistent information on the measurements at a location near the POI and how it affects the target, without any explicit knowledge of where that location is in coordinate system terms.

Development

You can install ocf-data-sampler for development as follows:

pip install git+https://github.com/openclimatefix/ocf-data-sampler.git

Running the test suite

The tests in this project use pytest. Once you have it installed, you can run it from the project's directory:

cd ocf-data-sampler
pytest

Contributing and community

issues badge

Contributors

Thanks goes to these wonderful people (emoji key):

James Fulton
James Fulton

💻
Alexandra Udaltsova
Alexandra Udaltsova

💻
Sukhil Patel
Sukhil Patel

💻 🐛
Peter Dudfield
Peter Dudfield

💻
Vikram Pande
Vikram Pande

💻
Unnati Bhardwaj
Unnati Bhardwaj

📖
Ali Rashid
Ali Rashid

💻
Felix
Felix

💻 ⚠️
Ajani Timothy
Ajani Timothy

💻
Rupesh Mangalam
Rupesh Mangalam

💻
Siddharth
Siddharth

💻
Sachin-G13
Sachin-G13

💻
Dorna Raj Gyawali
Dorna Raj Gyawali

💻
Adnan Hashmi
Adnan Hashmi

💻
utsav-pal
utsav-pal

💻
zaryab-ali
zaryab-ali

💻

This project follows the all-contributors specification. Contributions of any kind welcome!


Part of the Open Climate Fix community.

OCF Logo

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocf_data_sampler-0.5.26.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocf_data_sampler-0.5.26-py3-none-any.whl (7.4 MB view details)

Uploaded Python 3

File details

Details for the file ocf_data_sampler-0.5.26.tar.gz.

File metadata

  • Download URL: ocf_data_sampler-0.5.26.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for ocf_data_sampler-0.5.26.tar.gz
Algorithm Hash digest
SHA256 3a223eaba16bab17de697f61403191749508502b512f2aaf9277bf64d53cd1e5
MD5 6360c1a8d08be1f96a788c735f6395be
BLAKE2b-256 6f4b31392ea49dc7ea1b88304253122e68d8a7a5f7cfef5413d214748abc75f8

See more details on using hashes here.

File details

Details for the file ocf_data_sampler-0.5.26-py3-none-any.whl.

File metadata

File hashes

Hashes for ocf_data_sampler-0.5.26-py3-none-any.whl
Algorithm Hash digest
SHA256 ff25dd7dc3310049c758c83f16a048614d50aedac59c5620b40d0c1a9cf833dc
MD5 bbf2c8dfe694fea9f311b4ddbbe5743a
BLAKE2b-256 4a7d59cbd9bcb8a413b90174a4ca35be8f0e3396863a0c6f0dcc314bd9209748

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page