Skip to main content

A package to retrieve data from the NOAA Global Historical Climatology Network Daily (GHCN-D) dataset on AWS S3.

Project description

NOAA-GHCN

PyPI version versions License: GPL v3 Ruff

A package to search and retrieve data from the National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network Daily (GHCN-D) dataset hosted on Amazon AWS S3.

Installation

Install with pip

pip install noaa-ghcn

Usage

Import and initialize the GHCN class

>>> from noaa_ghcn import GHCN
>>> ghcn = GHCN()
>>> ghcn
NOAA Global Historical Climatology Network Daily (GHCN-D): 129,657 stations, 765,719 inventory records

When initialized for the first time, the station and inventory data files will automatically be downloaded to the package data directory. If the data files already exist, the timestamps are checked and the user will be asked if they optionally want to update the data.

Available functionality

The available attributes and methods are:

ghcn.elements
ghcn.stations
ghcn.inventory
ghcn.filter_stations()
ghcn.filter_inventory()
ghcn.load_data()

[!NOTE] The units have been standardized compared to the original data such that downloaded data has one of the following units: degrees, mm, degrees C, percent, minutes, hPa, cm

Parameters/Elements

All the available parameters (elements) can be listed with

>>> ghcn.elements
The five core elements are:
PRCP = Precipitation (mm)
SNOW = Snowfall (mm)
...

Stations

All stations are accessible as a Pandas.DataFrame

>>> ghcn.stations.head(3)
            ID  LATITUDE  LONGITUDE ELEVATION                   NAME STATE GSN FLAG  HCN/CRN FLAG  WMO ID                  geometry
0  ACW00011604   17.1167   -61.7833      10.1  ST JOHNS COOLIDGE FLD   NaN      NaN           NaN     NaN  POINT (-61.7833 17.1167)
1  ACW00011647   17.1333   -61.7833      19.2               ST JOHNS   NaN      NaN           NaN     NaN  POINT (-61.7833 17.1333)
2  AE000041196   25.3330    55.5170      34.0    SHARJAH INTER. AIRP   NaN      GSN       41196.0     NaN     POINT (55.517 25.333)

Inventory

The inventory (which elements are available for each station and timeframe) is also accessible as a Pandas.DataFrame

>>> ghcn.inventory.head(3)
            ID  LATITUDE  LONGITUDE ELEMENT  FIRSTYEAR  LASTYEAR                  geometry
0  ACW00011604   17.1167   -61.7833    TMAX       1949      1949  POINT (-61.7833 17.1167)
1  ACW00011604   17.1167   -61.7833    TMIN       1949      1949  POINT (-61.7833 17.1167)
2  ACW00011604   17.1167   -61.7833    PRCP       1949      1949  POINT (-61.7833 17.1167)

Filter Stations

Stations can be filtered by a list of station 'ID'

>>> ghcn.filter_stations(station_ids=['AE000041196', 'AFM00040938'])
            ID  LATITUDE  LONGITUDE ELEVATION                 NAME STATE GSN FLAG  HCN/CRN FLAG  WMO ID               geometry
2  AE000041196    25.333     55.517      34.0  SHARJAH INTER. AIRP   NaN      GSN       41196.0     NaN  POINT (55.517 25.333)
7  AFM00040938    34.210     62.228     977.2                HERAT   NaN      NaN       40938.0     NaN   POINT (62.228 34.21)

Or filtered by a shapely.Geometry (assumes geometry is in EPSG:4326 / WGS 84 coordinates)

>>> import shapely
>>> my_stations = ghcn.filter_stations(geometry=shapely.box(-10.7, 51.3, -5.3, 55.6))
>>> my_stations.head(3)
                ID  LATITUDE  LONGITUDE ELEVATION                  NAME STATE GSN FLAG  HCN/CRN FLAG  WMO ID                  geometry
33480  EI000003953   51.9394   -10.2219       9.0  VALENTIA OBSERVATORY   NaN      GSN        3953.0     NaN  POINT (-10.2219 51.9394)
33481  EI000003965   53.0903    -7.8764      70.0                  BIRR   NaN      NaN        3965.0     NaN   POINT (-7.8764 53.0903)
33482  EI000003969   53.3639    -6.3192      49.0   DUBLIN PHOENIX PARK   NaN      NaN        3969.0     NaN   POINT (-6.3192 53.3639)

Filter Inventory

The inventory can be filtered for given stations, geometry (bounding box), elements and dates. The filtered dataframe has columns of start_date and end_date that correspond to the available data for each station and element.

>>> import datetime as dt
>>> inventory_subset = ghcn.filter_inventory(station_ids=['AE000041196', 'AFM00040938'], start_date= dt.datetime(2020, 2, 3), end_date=dt.datetime(2024, 11, 27))
>>> inventory_subset
             ID  LATITUDE  LONGITUDE ELEMENT  FIRSTYEAR  LASTYEAR               geometry start_date   end_date
18  AE000041196    25.333     55.517    TMAX       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
19  AE000041196    25.333     55.517    TMIN       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
20  AE000041196    25.333     55.517    PRCP       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
39  AFM00040938    34.210     62.228    TMAX       1973      2020   POINT (62.228 34.21) 2020-03-21 2020-12-31
40  AFM00040938    34.210     62.228    TMIN       1973      2020   POINT (62.228 34.21) 2020-03-21 2020-12-31
41  AFM00040938    34.210     62.228    PRCP       2014      2021   POINT (62.228 34.21) 2020-03-21 2021-12-31
42  AFM00040938    34.210     62.228    SNWD       1982      2021   POINT (62.228 34.21) 2020-03-21 2021-12-31

This subset of the inventory can be further refined before retrieving data, e.g. removing stations/elements that have too few data

# Only include elements that have at least 1 year of data
>>> idx = inventory_subset['end_date'].sub(inventory_subset['start_date']).dt.days > 365
>>> inventory_subset = inventory_subset.loc[idx]
>>> inventory_subset
             ID  LATITUDE  LONGITUDE ELEMENT  FIRSTYEAR  LASTYEAR               geometry start_date   end_date
18  AE000041196    25.333     55.517    TMAX       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
19  AE000041196    25.333     55.517    TMIN       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
20  AE000041196    25.333     55.517    PRCP       1944      2025  POINT (55.517 25.333) 2020-03-21 2024-11-27
41  AFM00040938    34.210     62.228    PRCP       2014      2021   POINT (62.228 34.21) 2020-03-21 2021-12-31
42  AFM00040938    34.210     62.228    SNWD       1982      2021   POINT (62.228 34.21) 2020-03-21 2021-12-31

Downloading Data

Using the inventory dataframe, the daily station data can be downloaded

>>> df = ghcn.load_data(inventory_subset)
Downloading data: 100%|█████████████████████████████| 5/5 [00:00<00:00,  9.18file/s]
>>> df.head()
            ID       DATE  DATA_VALUE M_FLAG Q_FLAG S_FLAG OBS_TIME ELEMENT
0  AE000041196 2020-03-24        23.8   None   None      S     None    TMAX
1  AE000041196 2020-03-25        25.1   None   None      S     None    TMAX
2  AE000041196 2020-03-26        27.7   None   None      S     None    TMAX
3  AE000041196 2020-03-27        31.5   None   None      S     None    TMAX
4  AE000041196 2020-03-28        34.5   None   None      S     None    TMAX

Development

Clone the repo and install the package with the dev environment. This project uses Pixi to manage dependencies, which should be installed first.

git clone https://github.com/colinahill/noaa-ghcn.git
cd noaa-ghcn
pixi install -e dev
pixi shell -e dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noaa_ghcn-0.3.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

noaa_ghcn-0.3.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file noaa_ghcn-0.3.0.tar.gz.

File metadata

  • Download URL: noaa_ghcn-0.3.0.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for noaa_ghcn-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8c7c6503056a702f543719365ee69f2bf3399ec559f74f6ab23b62ad1aed809f
MD5 4d94e41e51ae5429917d83f378c423b4
BLAKE2b-256 273ff6337fb27fd1a80c7c120594c9b9201ad1b43c502a22fb7fd3f84dfd7a4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for noaa_ghcn-0.3.0.tar.gz:

Publisher: publish_pypi.yml on colinahill/noaa-ghcn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file noaa_ghcn-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: noaa_ghcn-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for noaa_ghcn-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 107a55a80c43ee09e5b6f903d7903366562f1f6f0b5bb00826d54f6559108a77
MD5 2585f0b46d52ad5004c444c4f80c8875
BLAKE2b-256 c98be6a5dc26ed90fb9fff071bf8db88f0ed777cc1d939e9c8fc1707e60910c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for noaa_ghcn-0.3.0-py3-none-any.whl:

Publisher: publish_pypi.yml on colinahill/noaa-ghcn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page