A Python API for the National Data Buoy Center.
Project description
NDBC API
A Python API for the National Data Buoy Center
The National Oceanic and Atmospheric Association's National Data Buoy Center maintains marine monitoring and observation stations around the world^1. These stations report atmospheric, oceanographic, and other meterological data at regular intervals to the NDBC. Measurements are made available over HTTP through the NDBC's data service.
The ndbc-api is a python library that makes this data more widely accessible.
The ndbc-api is primarily built to parse whitespace-delimited oceanographic and atmospheric data distributed as text files for available time ranges, on a station-by-station basis^2. Measurements are typically distributed as utf-8
encoded, station-by-station, fixed-period text files. More information on the measurements and methodology are available on the NDBC website^3.
Please see the included example notebook for a more detailed walkthrough of the API's capabilities.
Installation
The ndbc-api
can be installed via PIP:
pip install ndbc-api
Requirements
The ndbc-api
has been tested on Python 3.6, 3.7, 3.8, 3.9, and 3.10. Python 2 support is not currently planned, but could be implemented based on the needs of the atmospheric research community.
The API uses synchronous HTTP requests to compile data matching the user-supplied parameters. The ndbc-api
package depends on:
- requests>=2.10.0
- pandas
- bs4
- html5lib>=1.1
Development
If you would like to contribute to the growth and maintenance of the ndbc-api
, please feel free to open a PR with tests covering your changes. The tests leverage pytest
and depend on the above requirements, as well as:
- coveralls
- httpretty
- pytest
- pytest-cov
- pyyaml
- pyarrow
Breaking changes will be considered, especially in the current alpha
state of the package on PyPi
. As the API further matures, breaking changes will only be considered with new major versions (e.g. N.0.0
).
Example
The ndbc-api
exposes public methods through the NdbcApi
class.
from ndbc_api import NdbcApi
api = NdbcApi()
The api
is a singleton, such that the underlying RequestHandler
and NDBC station-level RequestCache
s are shared between instances. Both the singleton metaclass and RequestHandler
are implemented to reduce the likelihood of repeat requests to the NDBC's data service, and to converse NDBC resources. This is balanced by a station-level cache_limit
, implemented as an LRU cache, which seeks to respect user resources.
Data made available by the NDBC falls into two broad catagories.
- Station metadata
- Station measurements
The api
supports a range of public methods for accessing data from the above catagories.
Station metadata
The api
has five key public methods for accessing NDBC metadata.
- The
stations
method, which returns all NDBC stations. - The
nearest_staion
method, which returns the station ID of the nearest station. - The
station
method, which returns station metadata from a given station ID. - The
available_realtime
method, which returns hyperlinks and measurement names for realtime measurements captured by a given station. - The
available_historical
method, which returns hyperlinks and measurement names for historical measurements captured by a given station.
stations
# get all stations and some metadata as a Pandas DataFrame
stations_df = api.stations()
# parse the response as a dictionary
stations_dict = api.stations(as_df=False)
nearest_station
# specify desired latitude and longitude
lat = '38.88N'
lon = '76.43W'
# find the station ID of the nearest NDBC station
nearest = api.nearest_station(lat=lat, lon=lon)
print(nearest_station)
'tplm2'
radial_search
# specify desired latitude, longitude, radius, and units
lat = '38.88N'
lon = '76.43W'
radius = 100
units = 'km'
# find the station IDs of all NDBC stations within the radius
nearby_stations_df = api.radial_search(lat=lat, lon=lon, radius=radius, units=units)
'tplm2'
station
# get staion metadata
tplm2_meta = api.station(station_id='tplm2')
# parse the response as a Pandas DataFrame
tplm2_df = api.station(station_id='tplm2', as_df=True)
available_realtime
# get all available realtime measurements, periods, and hyperlinks
tplm2_realtime = api.available_realtime(station_id='tplm2')
# parse the response as a Pandas DataFrame
tplm2_realtime_df = api.available_realtime(station_id='tplm2', as_df=True)
available_historical
# get all available historical measurements, periods, and hyperlinks
tplm2_historical = api.available_historical(station_id='tplm2')
# parse the response as a Pandas DataFrame
tplm2_historical_df = api.available_historical(station_id='tplm2', as_df=True)
Station measurements
The api
has two public method which support accessing supported NDBC station measurements.
- The
get_modes
method, which returns a list of supportedmode
s, coresponding to the data formats provided by the NDBC data service.
Note that not all stations provide the same set of measurements. The available_realtime
and available_historical
methods can be called on a station-by station basis to ensure a station has the desired data available, before building and executing requests with get_data
.
- The
get_data
method, which returns measurements of a given type for a given station.
get_modes
# get the list of supported meterological measurement modes
modes = api.get_modes()
print(modes)
[
'adcp',
'cwind',
'ocean',
'spec',
'stdmet',
'supl',
'swden',
'swdir',
'swdir2',
'swr1',
'swr2'
]
get_data
# get all continuous wind measurements for station tplm2
cwind_df = api.get_data(
station_id='tplm2',
mode='cwind',
start_time='2020-01-01',
end_time='2022-09-15',
)
# return data as a dictionary
cwind_dict = api.get_data(
station_id='tplm2',
mode='cwind',
start_time='2020-01-01',
end_time='2022-09-15',
as_df=False
)
# get only the wind speed measurements
wspd_df = api.get_data(
station_id='tplm2',
mode='cwind',
start_time='2020-01-01',
end_time='2022-09-15',
as_df=True,
cols=['WSPD']
)
# get all standard meterological measurements for stations tplm2 and apam2
stdmet_df = api.get_data(
station_ids=['tplm2', 'apam2'],
mode='stdmet',
start_time='2022-01-01',
end_time='2023-01-01',
)
# get all (available) continuous wind and standard meterological measurements for stations tplm2 and apam2
# for station apam2, this is unavailable and will log an error but not affect the rest of the results.
stdmet_df = api.get_data(
station_ids=['tplm2', 'apam2'],
modes=['stdmet', 'cwind'],
start_time='2022-01-01',
end_time='2023-01-01',
)
More Information
Please see the included example notebook for a more detailed walkthrough of the API's capabilities.
Questions
If you have questions regarding the library please post them into the GitHub discussion forum.
Contributing
The ndbc-api
is actively maintained, please feel free to open a pull request if you have any suggested improvements, test coverage is strongly preferred.
As a reminder, breaking changes will be considered, especially in the current alpha
state of the package on PyPi
. As the API further matures, breaking changes will only be considered with new major versions (e.g. N.0.0
).
Alternatively, if you have an idea for a new capability or improvement, feel free to open a feature request issue outlining your suggestion and the ways in which it will empower the atmospheric research community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.