Skip to main content

Download and process NWP forecast data from cloud archives

Project description

NWPIO

A Python library for downloading and processing Numerical Weather Prediction (NWP) forecast data from cloud archives.

Features

Data Download

  • Multiple NWP models - GFS, ECMWF HRES/ENS support
  • Flexible resolutions - 0.1°, 0.25°, 0.5°, 1.0° depending on product
  • Configurable cycles - 00z, 06z, 12z, 18z with variable lead times (up to 384h)
  • Parallel downloads - Configurable workers for fast transfers
  • GCS-to-GCS copying - No local storage needed for large files
  • File validation - Ensures all files are complete before downloading
  • Smart skipping - Avoid re-downloading existing files

GRIB Processing

  • Variable extraction - Select specific variables from GRIB files
  • Time concatenation - Combine multiple files along time dimension
  • Zarr conversion - Efficient chunked storage format
  • Configurable chunking - Optimize for your access patterns
  • Compression support - Multiple algorithms (zstd, lz4)
  • GRIB key filtering - Filter by level, type, etc.
  • Parallel GRIB loading - Fast processing with multiple workers

Production Ready

  • Type-safe configuration - Pydantic models with validation
  • Flexible cycle configuration - CLI (--cycle), environment ($CYCLE), or config file
  • Multi-process workflow - Download once, process multiple variable sets
  • Cycle-based formatting - Dynamic paths with {cycle:%Y%m%d} placeholders
  • Comprehensive logging - Track progress and debug issues
  • Error handling - Robust recovery and retry logic
  • Automatic cleanup - Optional GRIB file deletion after processing
  • Docker support - Container-ready for cloud deployment

Installation

pip install -e .

For development:

pip install -e ".[dev]"

Quick Start

Download GRIB files

from nwpio import GribDownloader, DownloadConfig
from datetime import datetime

config = DownloadConfig(
    product="gfs",
    resolution="0p25",
    forecast_time=datetime(2024, 1, 1, 0),
    cycle="00z",
    max_lead_time=120,  # hours
    source_bucket="gcp-public-data-arco-era5",
    destination_bucket="your-bucket-name",
)

downloader = GribDownloader(config)
downloaded_files = downloader.download()

Process GRIB to Zarr

from nwpio import GribProcessor, ProcessConfig

config = ProcessConfig(
    grib_files=downloaded_files,
    variables=["t2m", "u10", "v10", "tp"],
    output_path="gs://your-bucket/output.zarr",
)

processor = GribProcessor(config)
processor.process()

Using the CLI

# Download GRIB files
nwpio download \
    --product gfs \
    --resolution 0p25 \
    --time 2024-01-01T00:00:00 \
    --cycle 00z \
    --max-lead-time 120 \
    --source-bucket gcp-public-data-arco-era5 \
    --dest-bucket your-bucket-name

# Process GRIB to Zarr
nwpio process \
    --grib-path gs://your-bucket/grib/ \
    --variables t2m,u10,v10,tp \
    --output gs://your-bucket/output.zarr

# Combined workflow
nwpio run \
    --config config.yaml

Configuration File Example

Single Process Configuration

# config.yaml
download:
  product: gfs
  resolution: 0p25
  cycle: "2024-01-01T00:00:00"
  max_lead_time: 6
  source_bucket: global-forecast-system
  destination_bucket: your-bucket-name
  destination_prefix: nwp-data/

process:
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 10
    zarr_path: gs://your-bucket/wind_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [u10, v10]
    write_local_first: true
    max_upload_workers: 16

Multi-Process Configuration (Recommended)

Download once, create multiple Zarr archives with different variable sets:

# config-multi.yaml
cleanup_grib: true  # Delete GRIB files after all processing

download:
  product: gfs
  resolution: 0p25
  cycle: "2024-01-01T00:00:00"
  max_lead_time: 6
  source_bucket: global-forecast-system
  destination_bucket: your-bucket-name

process:
  # Process 1: 10m winds
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 10
    zarr_path: gs://your-bucket/wind10m_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [u10, v10]
    max_upload_workers: 16
    
  # Process 2: 2m temperature and humidity
  - filter_by_keys:
      typeOfLevel: heightAboveGround
      level: 2
    zarr_path: gs://your-bucket/surface_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
    variables: [t2m, d2m]
    max_upload_workers: 16

Run with:

nwpio run --config config-multi.yaml --max-workers 8

ECMWF Source Selection

ECMWF data is available from two sources. Simply specify source_type to choose:

# Use GCS (Google Cloud Storage) - Official ECMWF bucket (default)
download:
  product: ecmwf-hres
  resolution: 0p25
  source_type: gcs  # Uses ecmwf-open-data bucket (default)
  max_lead_time: 120

# Use AWS S3 - Alternative source
download:
  product: ecmwf-hres
  resolution: 0p25
  source_type: aws  # Uses ecmwf-forecasts bucket
  max_lead_time: 120

The source_type defaults to gcs. The appropriate bucket is automatically selected based on the product and source type. You can override with a custom source_bucket if needed.

Supported Products

GFS (Global Forecast System)

  • Resolutions: 0p25 (0.25°), 0p50 (0.5°), 1p00 (1.0°)
  • Cycles: 00z, 06z, 12z, 18z
  • Lead times: Up to 384 hours

ECMWF

  • Products: HRES (High Resolution), ENS (Ensemble)
  • Resolutions: 0p1 (0.1°), 0p25 (0.25°)
  • Cycles: 00z, 12z
  • Lead times: Up to 240 hours (HRES), 360 hours (ENS)
  • Sources:
    • GCS: gs://ecmwf-open-data (official ECMWF bucket)
    • AWS: s3://ecmwf-forecasts (alternative source)

Architecture

nwpio/
├── __init__.py
├── config.py          # Configuration models using Pydantic
├── sources.py         # Data source definitions for GFS/ECMWF
├── downloader.py      # GRIB file download logic
├── processor.py       # GRIB to Zarr conversion
├── utils.py           # Utility functions
└── cli.py             # Command-line interface

Requirements

  • Python 3.9+
  • Google Cloud Storage access (with appropriate credentials)
  • GRIB file support (eccodes library)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nwpio-0.1.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nwpio-0.1.0-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file nwpio-0.1.0.tar.gz.

File metadata

  • Download URL: nwpio-0.1.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nwpio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a447217acf61e7f943168c285a4522400ddc0d114366c0e5ea83738b0bb19967
MD5 17aceeb7cb4194f6a1c2af56cd6324b5
BLAKE2b-256 c5b98707a08fe5b11bc31f723774c95b187ae3184fda822e7de1c4e1448d4c86

See more details on using hashes here.

Provenance

The following attestation bundles were made for nwpio-0.1.0.tar.gz:

Publisher: publish.yml on oceanum/nwpio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nwpio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nwpio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nwpio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ed9f434c885be3cf745065d9b2e9afc79584587af0c79a4a76f47a60f16b926
MD5 757cb8be2b36274284f90559c88489ad
BLAKE2b-256 a23d7dbe55af700e81976e19139b45e3c6434c631eccee4976574cd91b8001b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for nwpio-0.1.0-py3-none-any.whl:

Publisher: publish.yml on oceanum/nwpio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page