Download and process NWP forecast data from cloud archives
Project description
NWPIO
A Python library for downloading and processing Numerical Weather Prediction (NWP) forecast data from cloud archives.
Features
Data Download
- Multiple NWP models - GFS, ECMWF HRES/ENS support
- Flexible resolutions - 0.1°, 0.25°, 0.5°, 1.0° depending on product
- Configurable cycles - 00z, 06z, 12z, 18z with variable lead times (up to 384h)
- Parallel downloads - Configurable workers for fast transfers
- GCS-to-GCS copying - No local storage needed for large files
- File validation - Ensures all files are complete before downloading
- Smart skipping - Avoid re-downloading existing files
GRIB Processing
- Variable extraction - Select specific variables from GRIB files
- Time concatenation - Combine multiple files along time dimension
- Zarr conversion - Efficient chunked storage format
- Configurable chunking - Optimize for your access patterns
- Compression support - Multiple algorithms (zstd, lz4)
- GRIB key filtering - Filter by level, type, etc.
- Parallel GRIB loading - Fast processing with multiple workers
Production Ready
- Type-safe configuration - Pydantic models with validation
- Flexible cycle configuration - CLI (
--cycle), environment ($CYCLE), or config file - Multi-process workflow - Download once, process multiple variable sets
- Cycle-based formatting - Dynamic paths with
{cycle:%Y%m%d}placeholders - Comprehensive logging - Track progress and debug issues
- Error handling - Robust recovery and retry logic
- Automatic cleanup - Optional GRIB file deletion after processing
- Docker support - Container-ready for cloud deployment
Installation
pip install -e .
For development:
pip install -e ".[dev]"
Quick Start
Download GRIB files
from nwpio import GribDownloader, DownloadConfig
from datetime import datetime
config = DownloadConfig(
product="gfs",
resolution="0p25",
forecast_time=datetime(2024, 1, 1, 0),
cycle="00z",
max_lead_time=120, # hours
source_bucket="gcp-public-data-arco-era5",
destination_bucket="your-bucket-name",
)
downloader = GribDownloader(config)
downloaded_files = downloader.download()
Process GRIB to Zarr
from nwpio import GribProcessor, ProcessConfig
config = ProcessConfig(
grib_files=downloaded_files,
variables=["t2m", "u10", "v10", "tp"],
output_path="gs://your-bucket/output.zarr",
)
processor = GribProcessor(config)
processor.process()
Using the CLI
# Download GRIB files
nwpio download \
--product gfs \
--resolution 0p25 \
--time 2024-01-01T00:00:00 \
--cycle 00z \
--max-lead-time 120 \
--source-bucket gcp-public-data-arco-era5 \
--dest-bucket your-bucket-name
# Process GRIB to Zarr
nwpio process \
--grib-path gs://your-bucket/grib/ \
--variables t2m,u10,v10,tp \
--output gs://your-bucket/output.zarr
# Combined workflow
nwpio run \
--config config.yaml
Configuration File Example
Single Process Configuration
# config.yaml
download:
product: gfs
resolution: 0p25
cycle: "2024-01-01T00:00:00"
max_lead_time: 6
source_bucket: global-forecast-system
destination_bucket: your-bucket-name
destination_prefix: nwp-data/
process:
- filter_by_keys:
typeOfLevel: heightAboveGround
level: 10
zarr_path: gs://your-bucket/wind_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
variables: [u10, v10]
write_local_first: true
max_upload_workers: 16
Multi-Process Configuration (Recommended)
Download once, create multiple Zarr archives with different variable sets:
# config-multi.yaml
cleanup_grib: true # Delete GRIB files after all processing
download:
product: gfs
resolution: 0p25
cycle: "2024-01-01T00:00:00"
max_lead_time: 6
source_bucket: global-forecast-system
destination_bucket: your-bucket-name
process:
# Process 1: 10m winds
- filter_by_keys:
typeOfLevel: heightAboveGround
level: 10
zarr_path: gs://your-bucket/wind10m_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
variables: [u10, v10]
max_upload_workers: 16
# Process 2: 2m temperature and humidity
- filter_by_keys:
typeOfLevel: heightAboveGround
level: 2
zarr_path: gs://your-bucket/surface_{cycle:%Y%m%d}_{cycle:%Hz}.zarr
variables: [t2m, d2m]
max_upload_workers: 16
Run with:
nwpio run --config config-multi.yaml --max-workers 8
ECMWF Source Selection
ECMWF data is available from two sources. Simply specify source_type to choose:
# Use GCS (Google Cloud Storage) - Official ECMWF bucket (default)
download:
product: ecmwf-hres
resolution: 0p25
source_type: gcs # Uses ecmwf-open-data bucket (default)
max_lead_time: 120
# Use AWS S3 - Alternative source
download:
product: ecmwf-hres
resolution: 0p25
source_type: aws # Uses ecmwf-forecasts bucket
max_lead_time: 120
The source_type defaults to gcs. The appropriate bucket is automatically selected based on the product and source type. You can override with a custom source_bucket if needed.
Supported Products
GFS (Global Forecast System)
- Resolutions: 0p25 (0.25°), 0p50 (0.5°), 1p00 (1.0°)
- Cycles: 00z, 06z, 12z, 18z
- Lead times: Up to 384 hours
ECMWF
- Products: HRES (High Resolution), ENS (Ensemble)
- Resolutions: 0p1 (0.1°), 0p25 (0.25°)
- Cycles: 00z, 12z
- Lead times: Up to 240 hours (HRES), 360 hours (ENS)
- Sources:
- GCS:
gs://ecmwf-open-data(official ECMWF bucket) - AWS:
s3://ecmwf-forecasts(alternative source)
- GCS:
Architecture
nwpio/
├── __init__.py
├── config.py # Configuration models using Pydantic
├── sources.py # Data source definitions for GFS/ECMWF
├── downloader.py # GRIB file download logic
├── processor.py # GRIB to Zarr conversion
├── utils.py # Utility functions
└── cli.py # Command-line interface
Requirements
- Python 3.9+
- Google Cloud Storage access (with appropriate credentials)
- GRIB file support (eccodes library)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nwpio-0.1.0.tar.gz.
File metadata
- Download URL: nwpio-0.1.0.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a447217acf61e7f943168c285a4522400ddc0d114366c0e5ea83738b0bb19967
|
|
| MD5 |
17aceeb7cb4194f6a1c2af56cd6324b5
|
|
| BLAKE2b-256 |
c5b98707a08fe5b11bc31f723774c95b187ae3184fda822e7de1c4e1448d4c86
|
Provenance
The following attestation bundles were made for nwpio-0.1.0.tar.gz:
Publisher:
publish.yml on oceanum/nwpio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nwpio-0.1.0.tar.gz -
Subject digest:
a447217acf61e7f943168c285a4522400ddc0d114366c0e5ea83738b0bb19967 - Sigstore transparency entry: 936524895
- Sigstore integration time:
-
Permalink:
oceanum/nwpio@4ab0ccb2c690abfdbc72c92765dc45208dc208c8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/oceanum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ab0ccb2c690abfdbc72c92765dc45208dc208c8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nwpio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nwpio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ed9f434c885be3cf745065d9b2e9afc79584587af0c79a4a76f47a60f16b926
|
|
| MD5 |
757cb8be2b36274284f90559c88489ad
|
|
| BLAKE2b-256 |
a23d7dbe55af700e81976e19139b45e3c6434c631eccee4976574cd91b8001b5
|
Provenance
The following attestation bundles were made for nwpio-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on oceanum/nwpio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nwpio-0.1.0-py3-none-any.whl -
Subject digest:
7ed9f434c885be3cf745065d9b2e9afc79584587af0c79a4a76f47a60f16b926 - Sigstore transparency entry: 936524901
- Sigstore integration time:
-
Permalink:
oceanum/nwpio@4ab0ccb2c690abfdbc72c92765dc45208dc208c8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/oceanum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ab0ccb2c690abfdbc72c92765dc45208dc208c8 -
Trigger Event:
release
-
Statement type: