Skip to main content

A preprocessing toolbox for converting datasets into assets ready for ML based workloads

Project description

# Preprocessing Toolbox

![GitHub issues](https://img.shields.io/github/issues/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub closed issues](https://img.shields.io/github/issues-closed/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub](https://img.shields.io/github/license/environmental-forecasting/preprocess-toolbox) ![GitHub forks](https://img.shields.io/github/forks/environmental-forecasting/preprocess-toolbox?style=social) ![GitHub forks](https://img.shields.io/github/stars/environmental-forecasting/preprocess-toolbox?style=social)

This is the preprocessing library for taking download-toolbox datasets and combining / composing multi-source data loaders that can be used to cache or supply downstream applications.

This is only just getting started, more info will appear soon.

Contact jambyr <at> bas <dot> ac <dot> uk if you want further information.

## Table of contents

  • [Overview](#overview)

  • [Installation](#installation)

  • [Implementation](#implementation)

  • [Contributing](#contributing)

  • [Credits](#credits)

  • [License](#license)

## Installation

Not currently released to pip.

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.md)

## Implementation

When installed, the library will provide a series of CLI commands. Please use the –help switch for more initial information, or the documentation.

### Basic principles

The library provides the ability to preprocess download-toolbox datasets and create singular configurations for reading out the data in a multi-channel format for dataset construction:

1. Preprocess datasets from download-toolbox so that the dataset is continuous and normalised for the downstream application 1. Generate a loader configuration, applying additional metadata (arbitrary channels and masks) providing initial access to the collected data 1. Use this data loader to produce usable application datasets for downstream applications (testing with IceNet and another internal application)

This is a base library upon which application specific processing is based, lowering the implementation overhead for creating multi-source datasets for environmental applications that require integration of data from sources that download-toolbox provide.

This library doesn’t have knowledge of those datasets, it forms the basis for processing things specific to an application by importing application-specific logic dynamically. See [this issue](https://github.com/environmental-forecasting/preprocess-toolbox/issues/1) for a quick idea of how this works with the [IceNet workflow](https://github.com/icenet-ai/icenet).

## Limitations

There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! I’m raising issues as I go

This is currently very heavy development functionality, but the following commands already work:

  • preprocess_missing_spatial - poorly at present due to missing mask backref implementation

  • preprocess_missing_time

  • preprocess_regrid

  • preprocess_rotate

  • preprocess_dataset

  • preprocess_loader_init

  • preprocess_add_mask

  • preprocess_add_channel

Other stubs probably don’t work, unless I forgot to update these docs!

## Contributing

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.md)

## Credits

<a href=”https://github.com/environmental-forecasting/preprocess-toolbox/graphs/contributors”><img src=”https://contrib.rocks/image?repo=environmental-forecasting/preprocess-toolbox” /></a>

## License

This is licensed using the [MIT License](LICENSE)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess_toolbox-0.0.4.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

preprocess_toolbox-0.0.4-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file preprocess_toolbox-0.0.4.tar.gz.

File metadata

  • Download URL: preprocess_toolbox-0.0.4.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for preprocess_toolbox-0.0.4.tar.gz
Algorithm Hash digest
SHA256 130192c11a8c43ff19085e68cd05c376ecbb7a350a8dd4f8a499244c1f2ba92d
MD5 6802da8a185a9b0efd599db6fa09c4f0
BLAKE2b-256 e233a06f0e92d56c12f41ce94050406ba15fc45f12947324ac71178a8285cef2

See more details on using hashes here.

File details

Details for the file preprocess_toolbox-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for preprocess_toolbox-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab50c2417ab73a3047f7263139011b3c367986893582090696f1993ec031210
MD5 7fd41e514353ef99081354ec24d4c941
BLAKE2b-256 3173ce031a3dcf433212ae12dde6a45f383d8901e06789169170edc92f4e597a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page