Library for downloading and preprocessing data
Project description
# BAS Download Toolbox
    
This is a python library providing CLI operations allowing users to download common environmental datasets for use in data pipelines. We use this within our optimisation and machine learning pipelines within BAS and it should be flexible enough to adapt to many different use cases.
Contact digitalinnovation <at> bas <dot> ac <dot> uk if you want further information.
## Table of contents
[Installation](#installation)
[Implementation](#implementation)
[Basic Principles](#basic-principles)
[Limitations](#limitations)
[Contributing](#contributing)
[Credits](#credits)
[License](#license)
## Installation
pip install download-toolbox
Please refer to [the contribution guidelines for more information][1].
## Implementation
When installed, the library will provide a series of CLI commands. Please use the –help switch for more initial information, or the documentation.
### Basic principles
The library sets up downloaders that will go through the following steps, for a variety of different data sources:
Set up a data store or if it exists, read the provenance config
Naively optimise the requested download
Download from the source in parallel
Transform the dataset into convenient to use files, ready for processing
That last step is important, as it might result in a different dataset to that which comes from source. The tool is intended to record this in the provenenace configuration, which is why it might exist in step (1), so that new data downloaded is consistent with what’s there - as well as the differences from the source data recorded for consistency (you should not be able to screw up existing datasets), posterity and reproducibility.
## Limitations
There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! They likely don’t have issues related, yet.
Works only for hemisphere level downloading - north or south. The overhaul for this intends to ensure that identifiers are used so that someone can specify “north” or “south” but equally specify “Norway” or “The Shops” and then provide a geolocation that would identify the dataset within the filesystem.
This is currently very heavy development functionality, but the following downloaders should work:
download_amsr2
download_cmip
download_era5
download_osisaf
Other stubs might not work, but there is a chance I’ll forget to update these docs!
## Contributing
Please refer to [the contribution guidelines for more information][1].
## Credits
<a href=”https://github.com/environmental-forecasting/download-toolbox/graphs/contributors”><img src=”https://contrib.rocks/image?repo=environmental-forecasting/download-toolbox” /></a>
## License
This is licensed using the [MIT License][2].
[1]: https://github.com/environmental-forecasting/download-toolbox/CONTRIBUTING.md [2]: https://github.com/environmental-forecasting/download-toolbox/LICENSE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file download_toolbox-0.0.6.tar.gz.
File metadata
- Download URL: download_toolbox-0.0.6.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f906396cf28e0d3f573c409505fd4ce52577e72764c6afa85410469ffd5cc79b
|
|
| MD5 |
0ca6a186e0bb1f92821320d942126ea4
|
|
| BLAKE2b-256 |
4cf62afe5a78afd6d905cd9ea16373fad947b96570fa942bafc8de80066a4fa2
|
File details
Details for the file download_toolbox-0.0.6-py3-none-any.whl.
File metadata
- Download URL: download_toolbox-0.0.6-py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c3ae3ef2aae524000d7aec62ebbe8961e0de80bc955ebc3541ab755b33076c4
|
|
| MD5 |
2c6817a9b1495948e7a89d0416c1a514
|
|
| BLAKE2b-256 |
4aa6c2c1d0efba341bb6000093f432efdb7ab7132322c0b8c1ccb924e3fca564
|