Skip to main content

File format conversions to cfdb

Project description

cfdb-ingest

Convert meteorological model output to cfdb with standardized CF conventions

build codecov PyPI version


Documentation: https://mullenkamp.github.io/cfdb-ingest/

Source Code: https://github.com/mullenkamp/cfdb-ingest


Overview

cfdb-ingest converts meteorological file formats (netCDF4/HDF5) from various model outputs into cfdb. It standardizes variable names and attributes to be consistent with CF conventions, making it straightforward to work with datasets from different sources through a single interface.

Supported sources:

  • WRF -- wrfout NetCDF files (all variables in one file per time range)
  • ERA5 -- NCAR ERA5 NetCDF files (one variable per file, surface + pressure level + invariant products)

Key features:

  • Automatic variable mapping -- source variable names are translated to CF-standard names with proper metadata via cfdb-vars
  • Named height coordinates -- surface variables at specific heights (0m, 2m, 10m, 100m) get their own named coordinates (e.g. height_2m), allowing them to coexist with pressure-level variables without ambiguity
  • Wind rotation (WRF) -- grid-relative wind components are rotated to earth-relative
  • VIMF computation (ERA5) -- native calculation of vertically integrated moisture flux from Q, U, and V
  • 3D level interpolation (WRF) -- eta-level variables are interpolated to user-specified height or pressure levels
  • Auto pressure level detection (ERA5) -- pressure levels are read directly from source files
  • Split or combined output (ERA5) -- create one cfdb per variable or combine into a single dataset
  • WPS intermediate file export -- convert cfdb datasets to WPS intermediate format for metgrid.exe
  • Spatial and temporal filtering -- subset by bounding box and/or date range
  • Multi-file support -- seamlessly spans multiple input files

Performance

cfdb-ingest is designed for high-performance processing of large meteorological datasets:

  • Vectorized rechunking -- utilizes rechunkit for optimized HDF5 reads, even when extracting small spatial subsets across many timesteps.
  • Parallel initialization -- multi-threaded file scanning and metadata extraction for fast startup.
  • HDF5 Chunk Caching -- intelligent management of the HDF5 chunk cache to prevent redundant I/O during per-timestep transformations.
  • Synchronized multi-variable rechunking -- synchronized iteration for derived variables (like VIMF) to eliminate redundant reads of shared source variables.

Installation

Requires Python >= 3.10.

pip install cfdb-ingest

Quick Start

WRF

from cfdb_ingest import WrfIngest

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'WIND10'],
    start_date='2023-02-12T06:00',
    end_date='2023-02-12T18:00',
)
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb -v T2,WIND10 -s 2023-02-12T06:00 -e 2023-02-12T18:00

For WPS export, use the --preset wps flag:

cfdb-ingest wrf /path/to/wrfout/ output.cfdb --preset wps -s 2023-02-10 -e 2023-02-10_06
cfdb-to-int output.cfdb -s 2023-02-10 -e 2023-02-10_06

ERA5

from cfdb_ingest import Era5Ingest

era5 = Era5Ingest('/path/to/era5/*.nc')
era5.convert(
    cfdb_path='era5.cfdb',
    variables=['SP', 'VAR_2T', 'T', 'U', 'V'],
    start_date='2020-01-01',
    end_date='2020-01-31',
)
# Combined: multiple variables in one cfdb
cfdb-ingest era5 /path/to/era5/*.nc output.cfdb -v SP,VAR_2T,T,U,V -s 2020-01-01 -e 2020-01-31

# Split: one cfdb file per variable
cfdb-ingest era5 /path/to/era5/*.nc /output/dir/ --split -v SP,T

See the full documentation for details.

License

This project is licensed under the terms of the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfdb_ingest-0.3.6.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfdb_ingest-0.3.6-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file cfdb_ingest-0.3.6.tar.gz.

File metadata

  • Download URL: cfdb_ingest-0.3.6.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for cfdb_ingest-0.3.6.tar.gz
Algorithm Hash digest
SHA256 bc0f0964db05082521037efa0fa984dd87402d43050c39d597c7ee7be972b185
MD5 bc046b1479bfa27035da0e6a0faf0ed0
BLAKE2b-256 c8a0d9c58af58334c394bc80e611bf35948e131a0784ae6e568d13f87ea10ab2

See more details on using hashes here.

File details

Details for the file cfdb_ingest-0.3.6-py3-none-any.whl.

File metadata

File hashes

Hashes for cfdb_ingest-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e4adc50bcb8b08ca6add45f809a42549685acc894ea380d5f81855122d88ce99
MD5 0f4719e5e4cd41c72647fbddc24b071e
BLAKE2b-256 ef49e732ff6fbd5130f2db64f519dd99c36fdbc0b685382f774e1e0d036c9709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page