Skip to main content

A pacakge to simplify to retrieval and parsing of NOAA NDBC data.

Project description

NDBC

alt text alt text

Documentation

This repository represents my attempts to build out Python class(es) to facilitate the acquisition, analysis, and visualization of National Data Buoy Center (NDBC) data. The goal is to develop a set of APIs to facilitate rapid discovery of data resources, exploratory data analysis, and allow integration into automated data workflows.

NDBC.py

This file defines the DataBuoy class. The purpose of this class is to allow a user to define a specific data buoy they wish to gather data from and provide the user with methods to collect and analyze this data.

Usage

Installation

Install using pip from PyPI

pip install NDBC

Then you are ready to start using this module in exploratory data analyses and scripted workflows.

Methods of DataBuoy Class

.set_station_id

If a DataBuoy class has been instantiated without any station_id parameter, this method allows for setting a station id

from NDBC.NDBC import DataBuoy
DB = DataBuoy()
DB.set_station_id('46042') # <- Either strings or numbers are acceptable

.get_station_metadata()

Perform a scrape of the public webpage for a specified data station and save a dictionary of available metadata to the .station_info property. This is only available if a DataBuoy has a valid station_id set (either during class instantiation or using the set_station_id method).

from NDBC.NDBC import DataBuoy
DB = DataBuoy(46042)
DB.get_station_metadata()
DB.station_info
{   'Air temp height': '4 m above site elevation',
    'Anemometer height': '5 m above site elevation',
    'Barometer elevation': 'sea level',
    'Sea temp depth': '0.6 m below water line',
    'Site elevation': 'sea level',
    'Watch circle radius': '1789 yards',
    'Water depth': '1645.9 m',
    'lat': '36.785 N',
    'lon': '122.398 W'}
  • .get_data(datetime_index=False)

After importing, the DataBuoy class is instantiated with the ID of the station from which historical data is sought. Then data may be gathered for the years and months specified. If no time period is specified, the most recent full month available is retrieved.

The default behavior is to append datetime values built from date part columns (YY, MM, DD, etc.) to a column 'datetime'. If value True is passed as the datetime_index argument, the datetime values will be used as index values for the returned dataframe. In some cases this is advantageous for time series analyses.

from NDBC.NDBC import DataBuoy

n42 = DataBuoy(46042)  # <- String or numeric station ids are valid

n42.get_data(datetime_index=True)  # <- no year, month argumets so latest full month is retrieved. Default data type is 'stdmet'

Oct not available.   # <- Where data is missing, messages are returned to the terminal via a logger.warning() call
Sep not available.

n42.data  # <- anticipating additional data collection methods, the .data property returns a dictionary.  Indiviudual
               data products are returned as pandas DataFrame objects

# Datetime objects are compiled from individual year, month, day, hour, minute columns and used as the index to support
# slicing data by time frames.

{'stdmet':          WDIR WSPD  GST  WVHT    DPD   APD  MWD    PRES  ATMP  WTMP   DEWP   VIS   TIDE
2019-07-31 23:50:00  298  3.6  5.2  1.25   7.69  5.37  303  1015.1  13.4  15.2  999.0  99.0  99.00
2019-08-01 00:50:00  301  5.7  7.2  1.26   7.14  5.42  306  1014.8  13.4  15.3  999.0  99.0  99.00
2019-08-01 01:50:00  323  6.6  8.3  1.33   7.14  5.47  312  1014.5  13.2  15.1  999.0  99.0  99.00
2019-08-01 02:50:00  347  5.8  7.7  1.32   7.69  5.15  319  1014.5  12.7  15.1  999.0  99.0  99.00
2019-08-01 03:50:00  353  5.6  7.2  1.26   7.69  5.31  325  1014.9  12.6  15.0  999.0  99.0  99.00
...                  ...  ...  ...   ...    ...   ...  ...     ...   ...   ...    ...   ...    ...
2019-08-31 18:50:00  999  6.2  7.4  0.87  13.79  4.67  186  1014.6  17.0  17.2  999.0  99.0  99.00
2019-08-31 19:50:00  999  6.8  8.3  0.83  13.79  4.56  178  1014.2  17.2  17.3  999.0  99.0  99.00
2019-08-31 20:50:00  999  6.5  7.8  0.89  13.79  4.38  195  1013.8  17.5  17.4  999.0  99.0  99.00
2019-08-31 21:50:00  999  7.5  8.9  0.95  13.79  4.52  190  1013.1  17.5  17.3  999.0  99.0  99.00
2019-08-31 22:50:00  999  8.0  9.4  0.95  13.79  4.09  171  1012.7  17.7  17.1  999.0  99.0  99.00

[741 rows x 13 columns]}

By default the get_data() function will fetch the most current month's data. However, the function can take lists of years & months ([int]) to specify a time-frame.

$ n42 = NDBC.DataBuoy('46042')
$ n42.get_data(months=[1,2], years=range(2019, 2020), datetime_index=True, data_type='swden)
Year 2019 not available.
Year 2020 not available.
 
$ n42.data
{'swden': {'data':                      .0200  .0325  .0375  .0425  .0475  .0525  .0575  .0625  .0675  .0725  .0775  .0825  .0875  ...  .3000  .3100  .3200  .3300  .3400  .3500  .3650  .3850  .4050  .4250  .4450  .4650  .4850
2021-01-01 00:40:00    0.0    0.0    0.0   0.00   1.17   9.11  24.25  24.95  15.84  20.44  26.48  20.63  12.72  ...   0.28   0.31   0.19   0.20   0.13   0.07   0.06   0.05   0.03   0.01   0.01   0.00    0.0
2021-01-01 01:40:00    0.0    0.0    0.0   0.00   0.00  13.76  26.55  22.40  24.12  30.09  23.41  15.74  14.95  ...   0.25   0.16   0.12   0.16   0.06   0.16   0.06   0.03   0.05   0.02   0.01   0.00    0.0
2021-01-01 02:40:00    0.0    0.0    0.0   0.00   0.93   4.40  16.03  33.95  41.48  38.02  31.47  18.88  14.59  ...   0.21   0.15   0.18   0.14   0.14   0.10   0.07   0.05   0.03   0.02   0.01   0.00    0.0
2021-01-01 03:40:00    0.0    0.0    0.0   0.07   1.14   6.95  27.94  45.68  41.92  30.11  25.03  19.52  10.93  ...   0.22   0.20   0.16   0.09   0.08   0.15   0.09   0.04   0.02   0.01   0.00   0.01    0.0
2021-01-01 04:40:00    0.0    0.0    0.0   0.00   0.76   3.64  11.23  18.23  29.84  27.19  12.85  11.20   9.77  ...   0.13   0.17   0.14   0.16   0.08   0.08   0.07   0.08   0.05   0.01   0.01   0.00    0.0
...                    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...  ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...
2021-02-28 19:40:00    0.0    0.0    0.0   0.00   0.00   0.00   0.06   0.25   1.42   2.50   9.48  11.48   8.46  ...   0.21   0.13   0.11   0.08   0.10   0.04   0.02   0.02   0.03   0.01   0.00   0.00    0.0
2021-02-28 20:40:00    0.0    0.0    0.0   0.02   0.05   0.08   0.24   1.02   3.97   4.97   4.99   8.31  10.09  ...   0.21   0.07   0.09   0.06   0.05   0.10   0.04   0.03   0.01   0.01   0.00   0.00    0.0
2021-02-28 21:40:00    0.0    0.0    0.0   0.00   0.00   0.15   0.30   0.36   1.63   4.18   6.85   7.82   7.98  ...   0.12   0.11   0.09   0.08   0.04   0.05   0.06   0.02   0.01   0.01   0.00   0.00    0.0
2021-02-28 22:40:00    0.0    0.0    0.0   0.00   0.01   0.09   0.10   0.32   2.84   3.82   3.91   4.92   5.17  ...   0.17   0.09   0.13   0.05   0.05   0.08   0.06   0.03   0.01   0.01   0.00   0.00    0.0
2021-02-28 23:40:00    0.0    0.0    0.0   0.00   0.00   0.00   0.18   0.25   1.78   3.97   5.08   4.98   5.40  ...   0.07   0.10   0.11   0.08   0.08   0.06   0.03   0.02   0.01   0.01   0.00   0.00    0.0

[1413 rows x 47 columns]}}

Likely due to my own biases in my research interests, the get_data() function will default to fetching standard meteorological data. However, users can specify different data packages like so get_data(data_type='cwind'). To view which data packages are currently supported examine the DataBuoy.DATA_PACKAGES attribute:

{'cwind': {'name': 'Continous Wind Data', 'url_char': 'c'},
 'srad': {'name': 'Solar radiation data', 'url_char': 'r'},
 'stdmet': {'name': 'Standard meteoroligcal data', 'url_char': 'h'},
 'swden': {'name': 'Spectral Wave Density data', 'url_char': 'w'},
 'swdir': {'name': 'Spectral wave (alpha1) direction data', 'url_char': 'd'},
 'swdir2': {'name': 'Spectral wave (alpha2) direction data', 'url_char': 'i'},
 'swr1': {'name': 'Spectral wave (r1) direction data', 'url_char': 'j'},
 'swr2': {'name': 'Spectral wave (r2) direction data', 'url_char': 'k'}}

Using the pandas DataFrame to store the returned data provides access to the wide array of methods the pandas package provides.

  • .save(filename(optional))

Saves an instantiated DataBuoy object as JSON to a file. If filename is not specified the file name will follow the databuoy_{station_id}.json convention.

db = DataBuoy(46042)
db.save('/path/to/file/my_filename.json')

classmethod

  • .load(filename) Instantiate a DataBuoy object from a file, generated by the .save() method.
db = DataBuoy.load('/path/to/file.json')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NDBC-1.2.0.tar.gz (301.4 kB view hashes)

Uploaded Source

Built Distribution

NDBC-1.2.0-py3-none-any.whl (16.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page