Library for getting dataset from noaa site

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Reinan_Br

These details have not been verified by PyPI

Project description

noawclg · GFS Dataset Manager

Download, cache and analyse NOAA GFS forecast data in one line of Python.

noawclg wraps the NOAA NOMADS grib-filter endpoint and exposes a clean Python API that lets you:

Download GFS 0.25° GRIB2 files with a single method call — one HTTP request per forecast hour regardless of how many variables you need.
Cache raw GRIB2 files to disk so repeated runs cost nothing.
Extract any combination of surface and upper-air variables into analysis-ready xarray.Dataset objects.
Save output as compressed NetCDF4 or chunked Zarr for downstream processing.

Installation
Quick Start
How It Works
API Reference
Variable Catalogue
Pre-defined Hour Sequences
Region Subsetting
Logging
Examples
Contributing
License

Installation

pip install noawclg

System dependency — eccodes

cfgrib requires the eccodes C library to decode GRIB2 files.

Platform	Command
Ubuntu / Debian	`sudo apt install libeccodes-dev`
macOS (Homebrew)	`brew install eccodes`
Conda (any OS)	`conda install -c conda-forge eccodes`

Quick Start

from noawclg import GFSDatasetManager

# Create a manager for the 06 Z run of 2026-04-03
mgr = GFSDatasetManager(date="20260403", cycle="06")

# Download t2m + precipitation for the next 48 h (6-hourly)
# → only 9 HTTP requests (one per hour), not 18
ds = mgr.build_multi_dataset(
    var_keys=["t2m", "prate"],
    hours=list(range(0, 49, 6)),
)

print(ds)
# <xarray.Dataset>
# Dimensions:  (time: 9, latitude: 721, longitude: 1440)
# Data variables:
#     t2m      (time, latitude, longitude) float64 ...
#     prate    (time, latitude, longitude) float64 ...

mgr.save_netcdf(ds, "/tmp/gfs_48h.nc")

How It Works

Single-download architecture

Previous approaches sent one HTTP request per variable per forecast hour.
For 5 variables × 48 hours that means 240 requests.

noawclg exploits the NOMADS grib-filter's multi-variable syntax to bundle every requested variable into a single URL per hour:

https://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_0p25_1hr.pl
  ?dir=/gfs.20260403/06/atmos
  &file=gfs.t06z.pgrb2.0p25.f024
  &var_TMP=on&lev_2_m_above_ground=on       ← t2m
  &var_PRATE=on&lev_surface=on              ← prate
  &var_PRMSL=on&lev_mean_sea_level=on       ← prmsl
  &subregion=&toplat=5&bottomlat=-35&...    ← optional region

Result: 5 variables × 48 hours = 9 requests (one per hour).

Disk cache

Every downloaded GRIB2 file is saved under output_dir with a deterministic filename that encodes the date, cycle, variable set, region tag and forecast hour:

gfs_20260403_06z_prate_t2m_5N35S75W34E_f024.grib2

On subsequent runs the file is reused without any network I/O.

cfgrib extraction

After downloading, each variable is extracted from the cached GRIB2 using cfgrib with a cascade of filter strategies (shortName → typeOfLevel → full scan) to handle the GRIB table inconsistencies that appear across GFS versions and sub-region files.

API Reference

`GFSDatasetManager`

GFSDatasetManager(
    date: str,
    cycle: str = "00",
    output_dir: str = "./gfs_output",
    region: dict | None = None,
    pause: float = 1.5,
)

Main entry point. All other methods are called on an instance of this class.

Parameter	Type	Description
`date`	`str`	Model run date in `YYYYMMDD` format. Required.
`cycle`	`str`	Model run cycle: `"00"`, `"06"`, `"12"` or `"18"`. Default `"00"`.
`output_dir`	`str`	Directory where GRIB2 files are cached. Created automatically. Default `"./gfs_output"`.
`region`	`dict \| None`	Bounding box for spatial subsetting (see Region Subsetting). `None` downloads the global grid.
`pause`	`float`	Seconds to sleep between consecutive HTTP requests. Helps avoid rate-limiting on NOMADS. Default `1.5`.

Raises: ValueError if cycle is not one of the four valid values.
Raises: ValueError if date does not match YYYYMMDD.

from noawclg import GFSDatasetManager

mgr = GFSDatasetManager(
    date="20260403",
    cycle="06",
    output_dir="./cache",
    region={"toplat": 5, "bottomlat": -35, "leftlon": -75, "rightlon": -34},
    pause=2.0,
)

`build_dataset`

mgr.build_dataset(
    var_key: str,
    hours: list[int],
    force_download: bool = False,
) -> xr.Dataset

Download and assemble a Dataset for a single variable.

Parameter	Type	Description
`var_key`	`str`	Variable key from the Variable Catalogue.
`hours`	`list[int]`	Forecast hours to include (e.g. `[0, 6, 12, 24]`).
`force_download`	`bool`	If `True`, re-download even if cached files exist. Default `False`.

Returns: xr.Dataset with dimensions:

Surface/single-level variables → (time, latitude, longitude)
Multi-level variables → (time, level, latitude, longitude)

Both datasets include a forecast_hour coordinate aligned to the time dimension.

Raises: RuntimeError if no files could be downloaded or read.

ds = mgr.build_dataset("t2m", hours=[0, 6, 12, 24, 48])
print(ds["t2m"].dims)   # ('time', 'latitude', 'longitude')
print(ds["t2m"].attrs)  # {'long_name': '2 metre temperature', 'units': 'C', ...}

`build_multi_dataset`

mgr.build_multi_dataset(
    var_keys: list[str],
    hours: list[int],
    force_download: bool = False,
) -> xr.Dataset

Download one file per hour containing all requested variables, then extract and merge them into a single Dataset.

This is the recommended method when you need more than one variable — it uses N_hours requests instead of N_vars × N_hours.

Parameter	Type	Description
`var_keys`	`list[str]`	List of variable keys from the Variable Catalogue.
`hours`	`list[int]`	Forecast hours to include.
`force_download`	`bool`	Re-download even if cached. Default `False`.

Returns: xr.Dataset with all requested variables merged via xr.merge(..., join="inner").

Variables that fail to extract are logged and skipped; a RuntimeError is raised only if all variables fail.

ds = mgr.build_multi_dataset(
    var_keys=["t2m", "prmsl", "prate", "u10", "v10"],
    hours=list(range(0, 25, 6)),
)
# ds contains t2m, prmsl, prate, u10, v10 all on the same time axis

`download_hours`

mgr.download_hours(
    var_keys: list[str],
    hours: list[int],
    force: bool = False,
) -> dict[int, Path]

Low-level method that performs the actual HTTP downloads.
Called internally by build_dataset and build_multi_dataset, but exposed for advanced use cases (e.g. downloading files without immediately building a Dataset).

Parameter	Type	Description
`var_keys`	`list[str]`	Variables to bundle into each download URL.
`hours`	`list[int]`	Forecast hours to download.
`force`	`bool`	Re-download cached files. Default `False`.

Returns: dict[int, Path] — mapping of {hour: path_to_grib2_file} for every successfully downloaded hour.

Files already on disk are returned immediately without any network I/O (cache hit is logged at INFO level).

files = mgr.download_hours(["t2m", "prate"], hours=[0, 6, 12])
# {0: PosixPath('.../gfs_..._f000.grib2'),
#  6: PosixPath('.../gfs_..._f006.grib2'),
#  12: PosixPath('.../gfs_..._f012.grib2')}

`save_netcdf`

mgr.save_netcdf(
    ds: xr.Dataset,
    filename: str,
    complevel: int = 4,
) -> Path

Save a Dataset to a zlib-compressed NetCDF4 file.

Parameter	Type	Description
`ds`	`xr.Dataset`	Dataset to save.
`filename`	`str`	Output file path. Absolute paths are used as-is; relative paths are resolved against `output_dir`.
`complevel`	`int`	zlib compression level 1–9 (higher = smaller file, slower write). Default `4`.

Returns: Path — absolute path of the saved file.

path = mgr.save_netcdf(ds, "/data/gfs_t2m_48h.nc")
# or relative (saved inside output_dir):
path = mgr.save_netcdf(ds, "gfs_t2m_48h.nc")

`save_zarr`

mgr.save_zarr(
    ds: xr.Dataset,
    store: str,
) -> Path

Save a Dataset as a chunked Zarr store (directory).

Zarr is preferred over NetCDF for large time-series because it supports:

Lazy chunked reads without loading the whole file into memory.
Appending new timesteps without rewriting existing data.

Parameter	Type	Description
`ds`	`xr.Dataset`	Dataset to save.
`store`	`str`	Output directory path. Relative paths are resolved against `output_dir`.

Returns: Path — absolute path of the Zarr store directory.

path = mgr.save_zarr(ds, "gfs_surface_16days.zarr")

`load_netcdf`

GFSDatasetManager.load_netcdf(path: str | Path) -> xr.Dataset

Static method. Lazily open a previously saved NetCDF file using Dask-backed chunking.

ds = GFSDatasetManager.load_netcdf("/data/gfs_t2m_48h.nc")
print(dict(ds.dims))  # {'time': 9, 'latitude': 721, 'longitude': 1440}

`load_zarr`

GFSDatasetManager.load_zarr(store: str | Path) -> xr.Dataset

Static method. Lazily open a previously saved Zarr store.

ds = GFSDatasetManager.load_zarr("gfs_surface_16days.zarr")

Variable Catalogue

Access the full catalogue at runtime:

from noawclg import VARIABLES, SURFACE_VARS, MULTILEVEL_VARS

print(SURFACE_VARS)    # all 2-D (no level dimension) variable keys
print(MULTILEVEL_VARS) # all variables with a vertical level dimension

Surface / single-level variables

Key	Long name	Units
`t2m`	2 metre temperature	°C
`d2m`	2 metre dewpoint temperature	°C
`r2`	2 metre relative humidity	%
`sh2`	2 metre specific humidity	kg kg⁻¹
`aptmp`	Apparent temperature	°C
`u10`	10 metre U wind component	m s⁻¹
`v10`	10 metre V wind component	m s⁻¹
`gust`	Wind speed (gust)	m s⁻¹
`prmsl`	Pressure reduced to MSL	hPa
`mslet`	MSLP (Eta model reduction)	hPa
`sp`	Surface pressure	hPa
`orog`	Orography	m
`lsm`	Land-sea mask	0–1
`vis`	Visibility	m
`prate`	Precipitation rate	kg m⁻² s⁻¹
`cpofp`	Percent frozen precipitation	%
`crain`	Categorical rain	—
`csnow`	Categorical snow	—
`cfrzr`	Categorical freezing rain	—
`cicep`	Categorical ice pellets	—
`sde`	Snow depth	m
`sdwe`	Water equivalent of snow depth	kg m⁻²
`pwat`	Precipitable water	kg m⁻²
`cwat`	Cloud water	kg m⁻²
`tcc`	Total cloud cover	%
`lcc`	Low cloud cover	%
`mcc`	Medium cloud cover	%
`hcc`	High cloud cover	%
`lftx`	Surface lifted index	K
`lftx4`	Best (4-layer) lifted index	K
`hlcy`	Storm relative helicity	m² s⁻²
`refc`	Composite radar reflectivity	dB
`siconc`	Sea ice area fraction	0–1
`veg`	Vegetation	%
`tozne`	Total ozone	DU

Multi-level variables

These variables include a level dimension in the output Dataset.

Key	Long name	Units	Levels
`t`	Temperature	°C	80–1000 hPa (13 levels)
`r`	Relative humidity	%	80–1000 hPa (13 levels)
`q`	Specific humidity	kg kg⁻¹	80, 1000 hPa
`gh`	Geopotential height	gpm	500–1000 hPa (5 levels)
`u`	U component of wind	m s⁻¹	200–1000 hPa (9 levels)
`v`	V component of wind	m s⁻¹	200–1000 hPa (9 levels)
`w`	Vertical velocity	Pa s⁻¹	100–850 hPa (8 levels)
`absv`	Absolute vorticity	s⁻¹	100–1000 hPa (8 levels)
`cape`	CAPE	J kg⁻¹	surface layers
`cin`	Convective inhibition	J kg⁻¹	surface layers
`st`	Soil temperature	°C	0–100 cm (4 layers)
`soilw`	Volumetric soil moisture	Proportion	0–100 cm (4 layers)

Pre-defined Hour Sequences

from noawclg import (
    HOURS_16DAYS,     # 0–120 h (6-hourly) + 123–384 h (3-hourly) — full 16-day run
    HOURS_5DAYS_1H,   # 0–120 h (1-hourly)
    HOURS_10DAYS_3H,  # 0–240 h (3-hourly)
    HOURS_16DAYS_3H,  # 0–120 h (3-hourly) + 123–384 h (3-hourly)
)

Use them directly with build_dataset or build_multi_dataset:

ds = mgr.build_dataset("t2m", hours=HOURS_16DAYS)

Region Subsetting

Pass a region dict to download only the data inside a bounding box.
This dramatically reduces file size and download time for regional studies.

# South America
REGION_SA = {
    "toplat":    12,
    "bottomlat": -56,
    "leftlon":   -82,
    "rightlon":  -34,
}

# Brazil
REGION_BR = {
    "toplat":    5,
    "bottomlat": -35,
    "leftlon":   -75,
    "rightlon":  -34,
}

mgr = GFSDatasetManager(
    date="20260403",
    cycle="06",
    region=REGION_BR,
)

Pass region=None (the default) for a global download.

Note: The region tag is embedded in the cache filename, so global and regional downloads never collide even when sharing the same output_dir.

Logging

noawclg uses Python's standard logging module under the logger name gfs_dataset.
Enable it in your application to see download progress, cache hits and extraction warnings:

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s  %(levelname)-8s  %(message)s",
    datefmt="%H:%M:%S",
)

Sample output:

10:02:15  INFO      Download: 9 hour(s) × 1 file each = 9 request(s)  (vars: ['t2m', 'prate'])
10:02:17  INFO      [multi] → f000  https://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_0p25_1hr.pl?...
10:02:19  INFO        [ok] f000  284 KB  |  11.1%  (1/9)  elapsed=2.1s  remaining≈15.2s
10:02:21  INFO      [cache] f006  gfs_20260403_06z_prate_t2m_global_f006.grib2
10:02:21  INFO      Extracting 't2m' …
10:02:21  INFO      Extracting 'prate' …

Examples

1 — Surface forecast for Brazil, 48 h

from noawclg import GFSDatasetManager

mgr = GFSDatasetManager(
    date="20260403",
    cycle="06",
    region={"toplat": 5, "bottomlat": -35, "leftlon": -75, "rightlon": -34},
)

ds = mgr.build_multi_dataset(
    var_keys=["t2m", "prate", "prmsl", "u10", "v10"],
    hours=list(range(0, 49, 6)),
)
mgr.save_netcdf(ds, "gfs_brazil_48h.nc")

2 — Upper-air wind profile, global, 24 h

ds = mgr.build_multi_dataset(
    var_keys=["u", "v", "gh"],   # multi-level isobaric
    hours=list(range(0, 25, 6)),
)
# ds["u"] has dims (time, level, latitude, longitude)
u_500 = ds["u"].sel(level=500)   # wind at 500 hPa

3 — 16-day t2m time-series, saved as Zarr

from noawclg import GFSDatasetManager, HOURS_16DAYS

mgr = GFSDatasetManager(date="20260403", cycle="00")
ds  = mgr.build_dataset("t2m", hours=HOURS_16DAYS)
mgr.save_zarr(ds, "gfs_t2m_16days.zarr")

4 — Reload and compute a daily mean

import xarray as xr
from noawclg import GFSDatasetManager

ds   = GFSDatasetManager.load_netcdf("gfs_brazil_48h.nc")
t2m  = ds["t2m"]
daily_mean = t2m.resample(time="1D").mean()
print(daily_mean)

5 — Download only (no Dataset construction)

files = mgr.download_hours(
    var_keys=["t2m", "prate"],
    hours=[0, 6, 12, 24],
)
# {0: PosixPath('./gfs_output/gfs_20260403_06z_prate_t2m_global_f000.grib2'), ...}

Contributing

Pull requests are welcome. For major changes please open an issue first to discuss what you would like to change.

git clone https://github.com/reinanbr/noawclg
cd noawclg
pip install -e ".[dev]"

License

MIT © Reinan BR

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Reinan_Br

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.3.0

Jun 6, 2026

2.2.7

May 3, 2026

2.2.6

Apr 19, 2026

2.2.3

Apr 17, 2026

This version

2.1.13

Apr 4, 2026

0.0.8

Nov 24, 2024

0.0.7.1

Jan 7, 2024

0.0.7

Jan 7, 2024

0.0.5

May 24, 2023

0.0.4.9

Jan 13, 2023

0.0.4.7

Jan 11, 2023

0.0.4.6

Jan 9, 2023

0.0.4.5.1

Jan 6, 2023

0.0.4.5

Jan 6, 2023

0.0.4.2

Mar 2, 2022

0.0.4.1

Jan 9, 2022

0.0.4

Jan 9, 2022

0.0.3

Jan 9, 2022

0.0.2b6 pre-release

Jan 6, 2022

0.0.2b5 pre-release

Jan 5, 2022

0.0.2b4 pre-release

Jan 4, 2022

0.0.2b3 pre-release

Jan 4, 2022

0.0.2b2 pre-release

Jan 4, 2022

0.0.2b1 pre-release

Jan 4, 2022

0.0.2b0 pre-release

Jan 4, 2022

0.0.1b101 pre-release

Jan 4, 2022

0.0.1b91 pre-release

Jan 4, 2022

0.0.1b10 pre-release

Jan 4, 2022

0.0.1b0 pre-release

Jan 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noawclg-2.1.13.tar.gz (61.3 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

noawclg-2.1.13-py3-none-any.whl (53.7 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file noawclg-2.1.13.tar.gz.

File metadata

Download URL: noawclg-2.1.13.tar.gz
Upload date: Apr 4, 2026
Size: 61.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for noawclg-2.1.13.tar.gz
Algorithm	Hash digest
SHA256	`f2835edf319eb7884b2a8248f6c4a75d4ad8bc64408f597bda11bcd2e6e7b509`
MD5	`44cf478ebed9715781e4b967c35018c1`
BLAKE2b-256	`0cc43120cf187aba3f389db5244a8e668bb3c93622f5d24f4ae78e4c72484743`

See more details on using hashes here.

Provenance

The following attestation bundles were made for noawclg-2.1.13.tar.gz:

Publisher: ci.yml on reinanbr/noawclg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: noawclg-2.1.13.tar.gz
- Subject digest: f2835edf319eb7884b2a8248f6c4a75d4ad8bc64408f597bda11bcd2e6e7b509
- Sigstore transparency entry: 1236554318
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: reinanbr/noawclg@317517f7fd71542bb6f6506e0beabf1080691821
- Branch / Tag: refs/tags/v2.1.13
- Owner: https://github.com/reinanbr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@317517f7fd71542bb6f6506e0beabf1080691821
- Trigger Event: push

File details

Details for the file noawclg-2.1.13-py3-none-any.whl.

File metadata

Download URL: noawclg-2.1.13-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 53.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for noawclg-2.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9441c13f2692c44d750aa3808c78f0e5f76c82cc553721351fd8a479ebaec300`
MD5	`6e20d6fddca12ee130ccc8a631c0b97b`
BLAKE2b-256	`d70cf1089b37bcc35462b7abd2eb1dbff1199131048ddebaf79bd70fae36555b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for noawclg-2.1.13-py3-none-any.whl:

Publisher: ci.yml on reinanbr/noawclg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: noawclg-2.1.13-py3-none-any.whl
- Subject digest: 9441c13f2692c44d750aa3808c78f0e5f76c82cc553721351fd8a479ebaec300
- Sigstore transparency entry: 1236554323
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: reinanbr/noawclg@317517f7fd71542bb6f6506e0beabf1080691821
- Branch / Tag: refs/tags/v2.1.13
- Owner: https://github.com/reinanbr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@317517f7fd71542bb6f6506e0beabf1080691821
- Trigger Event: push

noawclg 2.1.13

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Project description

noawclg · GFS Dataset Manager

Table of Contents

Installation

System dependency — eccodes

Quick Start

How It Works

Single-download architecture

Disk cache

cfgrib extraction

API Reference

GFSDatasetManager

build_dataset

build_multi_dataset

download_hours

save_netcdf

save_zarr

load_netcdf

load_zarr

Variable Catalogue

Surface / single-level variables

Multi-level variables

Pre-defined Hour Sequences

Region Subsetting

Logging

Examples

1 — Surface forecast for Brazil, 48 h

2 — Upper-air wind profile, global, 24 h

3 — 16-day t2m time-series, saved as Zarr

4 — Reload and compute a daily mean

5 — Download only (no Dataset construction)

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`GFSDatasetManager`

`build_dataset`

`build_multi_dataset`

`download_hours`

`save_netcdf`

`save_zarr`

`load_netcdf`

`load_zarr`