datasets, samplers, transforms, and pre-trained models for hydrology and water resources

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Project description

Torchhydro

License: BSD license
Documentation: https://OuyangWenyu.github.io/torchhydro

📜 中文文档

Note: This repository is still under development

Installation

We provide a pip package for installation:

pip install torchhydro

If you want to participate in the development as a developer, you can install the environment and download the code using the following method:

# fork this repository to your GitHub account -- xxxx
git clone git@github.com:xxxx/torchhydro.git
cd torchhydro
# If you find it slow, you can install with mamba
# conda install mamba -c conda-forge
# mamba env create -f env-dev.yml
conda env create -f env-dev.yml
conda activate torchhydro

Usage

Currently, we provide an example of training an LSTM on the CAMELS dataset. The functions for reading CAMELS are all written in hydrodataset, so first read its readme to download the data properly and place it in the specified folder path. Regarding the folder configuration, check if there is a hydro_setting.yml file in your user directory. If not, manually create one, and refer to here to ensure the local_data_path is set correctly. If you can't download the CAMELS data, you can directly use a version we uploaded on Kaggle: kaggle CAMELS

Then you can try running the files under the experiments folder, such as:

cd experiments
python run_camelslstm_experiments.py

Main Modules

The program mainly includes trainers, models, datasets, and configs, with an additional explainer responsible for the model interpretation part.

Trainers: Designed to handle various modes, the main one being a DeepHydro class, found in the deep_hydro module (a .py file). This class configures its data sources, obtains configurations about the model, data, training, and testing (details here), and then initializes the model (load_model function), the data (make_dataset function), and performs training (model_train function) and testing (model_evaluate function). Transfer learning, multitask learning, and federated learning modes will inherit this class and rewrite specific execution code.
Models: Mainly declared through a model_dict, which shows which models are available for configuration. This includes the selection of loss, and then the remaining model modules like lstm or differentiable models with coupled physical mechanisms.
Datasets: First, we set up several datasource repository tools to provide data sources, including the public dataset hydrodataset (like CAMELS) and hydrodatasource (which requires organizing data by oneself). These data sources mainly provide data access, and in torchhydro, specific torch datasets can be written to match the model's data type. The dataset also has a dict to record, and then specific dataset class modules.
Configs: This mainly involves overall configurations, which are loaded during the initialization of the DeepHydro class. It's contained in the config module, primarily encompassing four parts: model (currently mode and model together), data (use of data time range, modeling object, etc.), training (training epochs, batch size, etc.), and testing (performance metrics).

Why Torchhydro?

Although there are relatively mature tools like NeuralHydrology, we chose not to use it directly for several reasons:

Our model-building mode is not limited to fixed datasets corresponding to a fixed Dataset and then connecting to the model. We believe that the data source, especially considering non-public data situations like in China, is very complex and requires a separate Datasource module to handle the data sources and then make a torch Dataset. This extra layer of abstraction makes code reuse easier. Moreover, not everyone requires deep learning, so having a separate Datasource module allows more hydrologists to use it. We created hydrodataset and hydrodatasource for this reason.
Deep learning modes are not limited to single-variable supervised learning of runoff. Commonly used modes include transfer learning, multitask learning, and federated learning. These modes may use the same specific models as conventional ones, but the program expression will differ significantly, requiring these different modes to be considered in the overall program design.
Sometimes, extra configuration is needed for data traversal, normalization methods, data sampling during batch generation, and dropout functionality during model training, necessitating a more flexible design compatible with different specific settings.
For historical reasons, we developed torchhydro independently and in parallel from the beginning, so it has continued as such. The main idea is to extend configuration outwardly as much as possible to achieve flexible matching and calling of data and models.

Additional Information

This package was inspired by:

This package was created using the Cookiecutter and the giswqs/pypackage project template.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.0.5

May 31, 2024

0.0.4

Nov 29, 2023

0.0.2

Sep 24, 2023

0.0.1

Aug 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchhydro-0.0.5.tar.gz (106.8 kB view hashes)

Uploaded May 31, 2024 Source

Built Distribution

torchhydro-0.0.5-py2.py3-none-any.whl (100.0 kB view hashes)

Uploaded May 31, 2024 Python 2 Python 3

Hashes for torchhydro-0.0.5.tar.gz

Hashes for torchhydro-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`7181d05d4f994433eda827904e05c72a8b81b7ca63f6c6ab1ea4796ea544c70d`
MD5	`0e5742b9ebd56163bd295b1ea650f187`
BLAKE2b-256	`ce354b248c07f2e5c9765427c1722c7e8987c7f56603b1ffa36b7b0fae5b31d2`

Hashes for torchhydro-0.0.5-py2.py3-none-any.whl

Hashes for torchhydro-0.0.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d07cce33cd4f23cf2ee0257c659abfba10a7b23ef493b3d8c59f0b3849fcffa7`
MD5	`0723a7c47df453899b1ea955eb1c0a36`
BLAKE2b-256	`a16373a71db1644bfc29ff3655fbc771554d97797a39919f48a81be58024bb72`