A package to retrieve data from the NOAA Global Historical Climatology Network Daily (GHCN-D) dataset on AWS S3.
Project description
NOAA-GHCN
A package to search and retrieve data from the National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network Daily (GHCN-D) dataset hosted on Amazon AWS S3.
Installation
Install with pip
pip install noaa-ghcn
Usage
Import and initialize the GHCN class
>>> from noaa_ghcn import GHCN
>>> ghcn = GHCN()
>>> ghcn
NOAA Global Historical Climatology Network Daily (GHCN-D): 129,657 stations, 765,719 inventory records
When initialized for the first time, the station and inventory data files will automatically be downloaded to the package data directory. If the data files already exist, the timestamps are checked and the user will be asked if they optionally want to update the data.
Available functionality
The available attributes and methods are:
ghcn.elements
ghcn.stations
ghcn.inventory
ghcn.filter_stations()
ghcn.filter_inventory()
ghcn.load_data()
[!NOTE] The units have been standardized compared to the original data such that downloaded data has one of the following units: degrees, mm, degrees C, percent, minutes, hPa, cm
Parameters/Elements
All the available parameters (elements) can be listed with
>>> ghcn.elements
The five core elements are:
PRCP = Precipitation (mm)
SNOW = Snowfall (mm)
...
Stations
All stations are accessible as a Pandas.DataFrame
>>> ghcn.stations.head(3)
ID LATITUDE LONGITUDE ELEVATION NAME STATE GSN FLAG HCN/CRN FLAG WMO ID geometry
0 ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD NaN NaN NaN NaN POINT (-61.7833 17.1167)
1 ACW00011647 17.1333 -61.7833 19.2 ST JOHNS NaN NaN NaN NaN POINT (-61.7833 17.1333)
2 AE000041196 25.3330 55.5170 34.0 SHARJAH INTER. AIRP NaN GSN 41196.0 NaN POINT (55.517 25.333)
Inventory
The inventory (which elements are available for each station and timeframe) is also accessible as a Pandas.DataFrame
>>> ghcn.inventory.head(3)
ID LATITUDE LONGITUDE ELEMENT FIRSTYEAR LASTYEAR geometry
0 ACW00011604 17.1167 -61.7833 TMAX 1949 1949 POINT (-61.7833 17.1167)
1 ACW00011604 17.1167 -61.7833 TMIN 1949 1949 POINT (-61.7833 17.1167)
2 ACW00011604 17.1167 -61.7833 PRCP 1949 1949 POINT (-61.7833 17.1167)
Filter Stations
Stations can be filtered by a list of station 'ID'
>>> ghcn.filter_stations(station_ids=['AE000041196', 'AFM00040938'])
ID LATITUDE LONGITUDE ELEVATION NAME STATE GSN FLAG HCN/CRN FLAG WMO ID geometry
2 AE000041196 25.333 55.517 34.0 SHARJAH INTER. AIRP NaN GSN 41196.0 NaN POINT (55.517 25.333)
7 AFM00040938 34.210 62.228 977.2 HERAT NaN NaN 40938.0 NaN POINT (62.228 34.21)
Or filtered by a shapely.Geometry (assumes geometry is in EPSG:4326 / WGS 84 coordinates)
>>> import shapely
>>> my_stations = ghcn.filter_stations(geometry=shapely.box(-10.7, 51.3, -5.3, 55.6))
>>> my_stations.head(3)
ID LATITUDE LONGITUDE ELEVATION NAME STATE GSN FLAG HCN/CRN FLAG WMO ID geometry
33480 EI000003953 51.9394 -10.2219 9.0 VALENTIA OBSERVATORY NaN GSN 3953.0 NaN POINT (-10.2219 51.9394)
33481 EI000003965 53.0903 -7.8764 70.0 BIRR NaN NaN 3965.0 NaN POINT (-7.8764 53.0903)
33482 EI000003969 53.3639 -6.3192 49.0 DUBLIN PHOENIX PARK NaN NaN 3969.0 NaN POINT (-6.3192 53.3639)
Filter Inventory
The inventory can be filtered for given stations, geometry (bounding box), elements and dates. The filtered dataframe has columns of start_date and end_date that correspond to the available data for each station and element.
>>> import datetime as dt
>>> inventory_subset = ghcn.filter_inventory(station_ids=['AE000041196', 'AFM00040938'], start_date= dt.datetime(2020, 2, 3), end_date=dt.datetime(2024, 11, 27))
>>> inventory_subset
ID LATITUDE LONGITUDE ELEMENT FIRSTYEAR LASTYEAR geometry start_date end_date
18 AE000041196 25.333 55.517 TMAX 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
19 AE000041196 25.333 55.517 TMIN 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
20 AE000041196 25.333 55.517 PRCP 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
39 AFM00040938 34.210 62.228 TMAX 1973 2020 POINT (62.228 34.21) 2020-03-21 2020-12-31
40 AFM00040938 34.210 62.228 TMIN 1973 2020 POINT (62.228 34.21) 2020-03-21 2020-12-31
41 AFM00040938 34.210 62.228 PRCP 2014 2021 POINT (62.228 34.21) 2020-03-21 2021-12-31
42 AFM00040938 34.210 62.228 SNWD 1982 2021 POINT (62.228 34.21) 2020-03-21 2021-12-31
This subset of the inventory can be further refined before retrieving data, e.g. removing stations/elements that have too few data
# Only include elements that have at least 1 year of data
>>> idx = inventory_subset['end_date'].sub(inventory_subset['start_date']).dt.days > 365
>>> inventory_subset = inventory_subset.loc[idx]
>>> inventory_subset
ID LATITUDE LONGITUDE ELEMENT FIRSTYEAR LASTYEAR geometry start_date end_date
18 AE000041196 25.333 55.517 TMAX 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
19 AE000041196 25.333 55.517 TMIN 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
20 AE000041196 25.333 55.517 PRCP 1944 2025 POINT (55.517 25.333) 2020-03-21 2024-11-27
41 AFM00040938 34.210 62.228 PRCP 2014 2021 POINT (62.228 34.21) 2020-03-21 2021-12-31
42 AFM00040938 34.210 62.228 SNWD 1982 2021 POINT (62.228 34.21) 2020-03-21 2021-12-31
Downloading Data
Using the inventory dataframe, the daily station data can be downloaded
>>> df = ghcn.load_data(inventory_subset)
Downloading data: 100%|█████████████████████████████| 5/5 [00:00<00:00, 9.18file/s]
>>> df.head()
ID DATE DATA_VALUE M_FLAG Q_FLAG S_FLAG OBS_TIME ELEMENT
0 AE000041196 2020-03-24 23.8 None None S None TMAX
1 AE000041196 2020-03-25 25.1 None None S None TMAX
2 AE000041196 2020-03-26 27.7 None None S None TMAX
3 AE000041196 2020-03-27 31.5 None None S None TMAX
4 AE000041196 2020-03-28 34.5 None None S None TMAX
Development
Clone the repo and install the package with the dev environment. This project uses Pixi to manage dependencies, which should be installed first.
git clone https://github.com/colinahill/noaa-ghcn.git
cd noaa-ghcn
pixi install -e dev
pixi shell -e dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file noaa_ghcn-0.3.0.tar.gz.
File metadata
- Download URL: noaa_ghcn-0.3.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c7c6503056a702f543719365ee69f2bf3399ec559f74f6ab23b62ad1aed809f
|
|
| MD5 |
4d94e41e51ae5429917d83f378c423b4
|
|
| BLAKE2b-256 |
273ff6337fb27fd1a80c7c120594c9b9201ad1b43c502a22fb7fd3f84dfd7a4f
|
Provenance
The following attestation bundles were made for noaa_ghcn-0.3.0.tar.gz:
Publisher:
publish_pypi.yml on colinahill/noaa-ghcn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
noaa_ghcn-0.3.0.tar.gz -
Subject digest:
8c7c6503056a702f543719365ee69f2bf3399ec559f74f6ab23b62ad1aed809f - Sigstore transparency entry: 190515200
- Sigstore integration time:
-
Permalink:
colinahill/noaa-ghcn@925204565ab28aaeb53b3a7a89d0d1543e5b9254 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/colinahill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@925204565ab28aaeb53b3a7a89d0d1543e5b9254 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file noaa_ghcn-0.3.0-py3-none-any.whl.
File metadata
- Download URL: noaa_ghcn-0.3.0-py3-none-any.whl
- Upload date:
- Size: 33.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
107a55a80c43ee09e5b6f903d7903366562f1f6f0b5bb00826d54f6559108a77
|
|
| MD5 |
2585f0b46d52ad5004c444c4f80c8875
|
|
| BLAKE2b-256 |
c98be6a5dc26ed90fb9fff071bf8db88f0ed777cc1d939e9c8fc1707e60910c1
|
Provenance
The following attestation bundles were made for noaa_ghcn-0.3.0-py3-none-any.whl:
Publisher:
publish_pypi.yml on colinahill/noaa-ghcn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
noaa_ghcn-0.3.0-py3-none-any.whl -
Subject digest:
107a55a80c43ee09e5b6f903d7903366562f1f6f0b5bb00826d54f6559108a77 - Sigstore transparency entry: 190515206
- Sigstore integration time:
-
Permalink:
colinahill/noaa-ghcn@925204565ab28aaeb53b3a7a89d0d1543e5b9254 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/colinahill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@925204565ab28aaeb53b3a7a89d0d1543e5b9254 -
Trigger Event:
workflow_dispatch
-
Statement type: