A preprocessing toolbox for converting datasets into assets ready for ML based workloads
Project description
# Preprocessing Toolbox
![GitHub issues](https://img.shields.io/github/issues/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub closed issues](https://img.shields.io/github/issues-closed/environmental-forecasting/preprocess-toolbox?style=plastic) ![GitHub](https://img.shields.io/github/license/environmental-forecasting/preprocess-toolbox) ![GitHub forks](https://img.shields.io/github/forks/environmental-forecasting/preprocess-toolbox?style=social) ![GitHub forks](https://img.shields.io/github/stars/environmental-forecasting/preprocess-toolbox?style=social)
This is the preprocessing library for taking download-toolbox datasets and combining / composing multi-source data loaders that can be used to cache or supply downstream applications.
This is only just getting started, more info will appear soon.
Contact jambyr <at> bas <dot> ac <dot> uk if you want further information.
## Table of contents
[Overview](#overview)
[Installation](#installation)
[Implementation](#implementation)
[Contributing](#contributing)
[Credits](#credits)
[License](#license)
## Installation
Not currently released to pip.
Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)
## Implementation
When installed, the library will provide a series of CLI commands. Please use the –help switch for more initial information, or the documentation.
### Basic principles
The library provides the ability to preprocess download-toolbox datasets and create singular configurations for reading out the data in a multi-channel format for dataset construction:
1. Preprocess datasets from download-toolbox so that the dataset is continuous and normalised for the downstream application 1. Generate a loader configuration, applying additional metadata (arbitrary channels and masks) providing initial access to the collected data 1. Use this data loader to produce usable application datasets for downstream applications (testing with IceNet and another internal application)
This is a base library upon which application specific processing is based, lowering the implementation overhead for creating multi-source datasets for environmental applications that require integration of data from sources that download-toolbox provide.
This library doesn’t have knowledge of those datasets, it forms the basis for processing things specific to an application by importing application-specific logic dynamically. See [this issue](https://github.com/environmental-forecasting/preprocess-toolbox/issues/1) for a quick idea of how this works with the [IceNet workflow](https://github.com/icenet-ai/icenet).
## Limitations
There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! I’m raising issues as I go
This is currently very heavy development functionality, but the following commands already work:
preprocess_missing_spatial - poorly at present due to missing mask backref implementation
preprocess_missing_time
preprocess_regrid
preprocess_rotate
preprocess_dataset
preprocess_loader_init
preprocess_add_mask
preprocess_add_channel
Other stubs probably don’t work, unless I forgot to update these docs!
## Contributing
Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)
## Credits
<a href=”https://github.com/environmental-forecasting/preprocess-toolbox/graphs/contributors”><img src=”https://contrib.rocks/image?repo=environmental-forecasting/preprocess-toolbox” /></a>
## License
This is licensed using the [MIT License](LICENSE)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file preprocess_toolbox-0.0.2.tar.gz
.
File metadata
- Download URL: preprocess_toolbox-0.0.2.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 627116c903e9f8771fc155bde5ac7d2c7528999e856e7e46f3ee61ea9e552ba1 |
|
MD5 | 4ef3bbe74b6a627ca6673fd89a1beb03 |
|
BLAKE2b-256 | 28bfc65bb3d6533f51ac5c65a440d863731fae5ae6226474306003dff7cee58c |
File details
Details for the file preprocess_toolbox-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: preprocess_toolbox-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcaffc4cda2462c862edc74ad3b8201fb67f7a9ad516d8880e21b81cac70b042 |
|
MD5 | fe2a3276cbbacc0d25b4d650db6db57a |
|
BLAKE2b-256 | cf180b1c33799293a4cc64e76a142490c4a3880a3e85a69dd60b47b488fb42eb |