Retrieve National Water Model data from Google Cloud Platform.
Project description
HydroTools :: GCP Client
This subpackage implements an interface to retrieve National Water Model (NWM) data from Google Cloud Platform. The primary use for this tool is to populate pandas.Dataframe
objects with NWM streamflow data. See the GCP Client Documentation for a complete list and description of the currently available methods. To report bugs or request new features, submit an issue through the HydroTools Issue Tracker on GitHub.
Installation
In accordance with the python community, we support and advise the usage of virtual
environments in any workflow using python. In the following installation guide, we
use python's built-in venv
module to create a virtual environment in which the
tool will be installed. Note this is just personal preference, any python virtual
environment manager should work just fine (conda
, pipenv
, etc. ).
# Create and activate python environment, requires python >= 3.8
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install --upgrade pip
# Install gcp_client
$ python3 -m pip install hydrotools.gcp_client
Usage
The following example demonstrates how one might use hydrotools.gcp_client
to retrieve NWM streamflow forecasts.
Code
# Import the GCP Client
from hydrotools.gcp_client import gcp
# Instantiate model data service
model_data_service = gcp.NWMDataService()
# Retrieve forecast data
# By default, only retrieves data at USGS gaging sites in
# CONUS that are used for model assimilation
forecast_data = model_data_service.get(
configuration = "short_range",
reference_time = "20210101T01Z"
)
# Look at the data
print(forecast_data.info(memory_usage='deep'))
print(forecast_data[['valid_time', 'value']].head())
Output
<class 'pandas.core.frame.DataFrame'>
Int64Index: 135738 entries, 0 to 135737
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 nwm_feature_id 135738 non-null category
1 reference_time 135738 non-null datetime64[ns]
2 valid_time 135738 non-null datetime64[ns]
3 value 135720 non-null float32
4 usgs_site_code 135738 non-null category
5 configuration 135738 non-null category
6 measurement_unit 135738 non-null category
7 variable_name 135738 non-null category
dtypes: category(5), datetime64[ns](2), float32(1)
memory usage: 6.0 MB
None
valid_time value
0 2021-01-01 02:00:00 16.940001
1 2021-01-01 03:00:00 25.570000
2 2021-01-01 04:00:00 37.590000
3 2021-01-01 05:00:00 52.279999
4 2021-01-01 06:00:00 67.869995
System Requirements
We employ several methods to make sure the resulting pandas.DataFrame
produced by gcp_client
are as efficient and manageable as possible. Nonetheless, this package can potentially use a large amount of memory.
The National Water Model generates multiple forecasts per day at over 2.7 million locations across the United States. A single forecast could be spread across hundreds of files and require repeated calls to Google Cloud Platform. The intermediate steps of retrieving and processing these files into leaner DataFrame
may use several GB of memory. As such, recommended minimum requirements to use this package are a 4-core consumer processor and 8 GB of RAM.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hydrotools.gcp_client-4.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47616afd07401324319326b67e37b4339c7734f22ce5498d69198d80a229a039 |
|
MD5 | c30bf1412217b3a0b87956814533e6b5 |
|
BLAKE2b-256 | a3f0e1714ca61e004c7ea11b14fe1e5effdf285e89390f0fee334d188bd8cf8f |
Hashes for hydrotools.gcp_client-4.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d49dff792c650b8f7309631c699109462ee3d2b6e2e0761674c61ea7c3cefbb0 |
|
MD5 | 0b70a4ff67f58ae03c04ebcda3a7bef3 |
|
BLAKE2b-256 | e837e132c0622b76df05b5dd3b625977f81fb09b146f9de9b62fb7cc5ef20140 |