Skip to main content

A preprocessing toolbox for converting datasets into assets ready for ML based workloads

Project description

# Preprocessing Toolbox

![GitHub issues](https://img.shields.io/github/issues/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub closed issues](https://img.shields.io/github/issues-closed/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub](https://img.shields.io/github/license/environmental-forecasting/preprocess-toolbox) ![GitHub forks](https://img.shields.io/github/forks/environmental-forecasting/preprocess-toolbox?style=social) ![GitHub forks](https://img.shields.io/github/stars/environmental-forecasting/preprocess-toolbox?style=social)

This is the preprocessing library for taking download-toolbox datasets and combining / composing multi-source data loaders that can be used to cache or supply downstream applications.

This is only just getting started, more info will appear soon.

Contact jambyr <at> bas <dot> ac <dot> uk if you want further information.

## Table of contents

  • [Overview](#overview)

  • [Installation](#installation)

  • [Implementation](#implementation)

  • [Contributing](#contributing)

  • [Credits](#credits)

  • [License](#license)

## Installation

Not currently released to pip.

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)

## Implementation

When installed, the library will provide a series of CLI commands. Please use the –help switch for more initial information, or the documentation.

### Basic principles

The library provides the ability to preprocess download-toolbox datasets and create singular configurations for reading out the data in a multi-channel format for dataset construction:

1. Preprocess datasets from download-toolbox so that the dataset is continuous and normalised for the downstream application 1. Generate a loader configuration, applying additional metadata (arbitrary channels and masks) providing initial access to the collected data 1. Use this data loader to produce usable application datasets for downstream applications (testing with IceNet and another internal application)

This is a base library upon which application specific processing is based, lowering the implementation overhead for creating multi-source datasets for environmental applications that require integration of data from sources that download-toolbox provide.

This library doesn’t have knowledge of those datasets, it forms the basis for processing things specific to an application by importing application-specific logic dynamically. See [this issue](https://github.com/environmental-forecasting/preprocess-toolbox/issues/1) for a quick idea of how this works with the [IceNet workflow](https://github.com/icenet-ai/icenet).

## Limitations

There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! I’m raising issues as I go

This is currently very heavy development functionality, but the following commands already work:

  • preprocess_missing_spatial - poorly at present due to missing mask backref implementation

  • preprocess_missing_time

  • preprocess_regrid

  • preprocess_rotate

  • preprocess_dataset

  • preprocess_loader_init

  • preprocess_add_mask

  • preprocess_add_channel

Other stubs probably don’t work, unless I forgot to update these docs!

## Contributing

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)

## Credits

<a href=”https://github.com/environmental-forecasting/preprocess-toolbox/graphs/contributors”><img src=”https://contrib.rocks/image?repo=environmental-forecasting/preprocess-toolbox” /></a>

## License

This is licensed using the [MIT License](LICENSE)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess_toolbox-0.0.2.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

preprocess_toolbox-0.0.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file preprocess_toolbox-0.0.2.tar.gz.

File metadata

  • Download URL: preprocess_toolbox-0.0.2.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for preprocess_toolbox-0.0.2.tar.gz
Algorithm Hash digest
SHA256 627116c903e9f8771fc155bde5ac7d2c7528999e856e7e46f3ee61ea9e552ba1
MD5 4ef3bbe74b6a627ca6673fd89a1beb03
BLAKE2b-256 28bfc65bb3d6533f51ac5c65a440d863731fae5ae6226474306003dff7cee58c

See more details on using hashes here.

File details

Details for the file preprocess_toolbox-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for preprocess_toolbox-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fcaffc4cda2462c862edc74ad3b8201fb67f7a9ad516d8880e21b81cac70b042
MD5 fe2a3276cbbacc0d25b4d650db6db57a
BLAKE2b-256 cf180b1c33799293a4cc64e76a142490c4a3880a3e85a69dd60b47b488fb42eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page