Add a short description here!
Project description
# ADCIRC ROM ADCIRC ROM (Reduced-Order-Modeling) is suite of tools for developing surrogate machine learning models of storm surge. The tools can be used both in an HPC and a single-threaded environment.
## Designsafe Quickstart
Start a Jupyter lab session.
2. Use Jupyter lab to launch a terminal, and in the terminal run the following: ` pip install netCDF4 sklearn global-land-mask xgboost geopandas scipy h5py fire git clone https://github.com/UT-CHG/adcirc-rom.git cd adcirc-rom python3 dataset.py setup `
This will create a data folder with the subdirectories datasets, storms and models. These subdirectories are needed for the model development workflow. The datasets directory is used for storing machine-learning ready datasets. The storms directory will contain the raw ADCIRC input (note when run within the DesignSafe environment this directory will be prepopulated with a dataset of 446 synthetic ADCIRC simulations). Finally, the models dataset is used for storing saved ML models and predictions.
3. To generate a dataset, run the command ` python3 dataset.py create default `
This will create a dataset named ‘default’ in the directory data/datasets. This dataset can be used to train machine learning models. The dataset.py script takes a number of options that control the size and scope of the generated dataset, as well as the included features.
Note: with the default settings, dataset creation in the Designsafe environment will take a few hours due to the lack of MPI support and the size of the data to be processed. The dataset generation script supports parallization with MPI - and is significantly faster when run on HPC resources such as TACC.
4. To train and save a new model named ‘xgb_base’, using the dataset named default, run the command ` python3 model.py train xgb_base --dataset=default `
This will create a new model named ‘xgb_base’. During, training, a portion of the dataset is set aside for testing purposes - predictions are generated for the test dataset and saved alongside the model binary. Additional model training parameters can be passed to the script.
To perform cross-validation, run the command
` python3 model.py cv --dataset=default `
This will also print out feature importances. Note that all model training parameters that are supported by the train are supported by the cv command as well. This allows for testing out of parameters before training a model.
Finally, to generate predictions on a new dataset using a saved model, run ` python3 model.py predict [modelname] --dataset=[datasetname] `
All predictions can be accessed in the folder data/datasets/[modelname].
Please reach out with any questions or bug reports to Benjamin Pachev <benjamin.pachev@gmail.com>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file adcirc-rom-0.0.0.tar.gz
.
File metadata
- Download URL: adcirc-rom-0.0.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 625001e81d07ff2682c17d2369409ce53c45610aa5854de1f6bdfe0aa3545360 |
|
MD5 | 58d606f8eefac9920999f36932609240 |
|
BLAKE2b-256 | 2a15a5472a92eb0ac49b1149caf92f10662fe878a6d72886c263462278b954cb |
File details
Details for the file adcirc_rom-0.0.0-py3-none-any.whl
.
File metadata
- Download URL: adcirc_rom-0.0.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f9c650c2f54547481f9677d00e4d87907736d867f9b028157838c61d54c7c8e |
|
MD5 | 3650a9ef287c3e49bccfe0034a8d83f6 |
|
BLAKE2b-256 | b6f6b8c6aaa9e9ed786448821737091aa56ac898f382eae29bc28b6d6ea42afb |