Skip to main content

File format conversions to cfdb

Project description

cfdb-ingest

Convert meteorological model output to cfdb with standardized CF conventions

build codecov PyPI version


Documentation: https://mullenkamp.github.io/cfdb-ingest/

Source Code: https://github.com/mullenkamp/cfdb-ingest


Overview

cfdb-ingest converts meteorological file formats (netCDF4/HDF5) from various model outputs into cfdb. It standardizes variable names and attributes to be consistent with CF conventions, making it straightforward to work with datasets from different sources through a single interface.

Supported sources:

  • WRF -- wrfout NetCDF files (all variables in one file per time range)
  • ERA5 -- NCAR ERA5 NetCDF files (one variable per file, surface + pressure level + invariant products)

Key features:

  • Automatic variable mapping -- source variable names are translated to CF-standard names with proper metadata via cfdb-vars
  • Named height coordinates -- surface variables at specific heights (0m, 2m, 10m, 100m) get their own named coordinates (e.g. height_2m), allowing them to coexist with pressure-level variables without ambiguity
  • Wind rotation (WRF) -- grid-relative wind components are rotated to earth-relative
  • VIMF computation (ERA5) -- native calculation of vertically integrated moisture flux from Q, U, and V
  • 3D level interpolation (WRF) -- eta-level variables are interpolated to user-specified height or pressure levels
  • Auto pressure level detection (ERA5) -- pressure levels are read directly from source files
  • Split or combined output (ERA5) -- create one cfdb per variable or combine into a single dataset
  • WPS intermediate file export -- convert cfdb datasets to WPS intermediate format for metgrid.exe
  • Spatial and temporal filtering -- subset by bounding box and/or date range
  • Multi-file support -- seamlessly spans multiple input files

Performance

cfdb-ingest is designed for high-performance processing of large meteorological datasets:

  • Vectorized rechunking -- utilizes rechunkit for optimized HDF5 reads, even when extracting small spatial subsets across many timesteps.
  • Parallel initialization -- multi-threaded file scanning and metadata extraction for fast startup.
  • HDF5 Chunk Caching -- intelligent management of the HDF5 chunk cache to prevent redundant I/O during per-timestep transformations.
  • Synchronized multi-variable rechunking -- synchronized iteration for derived variables (like VIMF) to eliminate redundant reads of shared source variables.

Installation

Requires Python >= 3.10.

pip install cfdb-ingest

Quick Start

WRF

from cfdb_ingest import WrfIngest

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'WIND10'],
    start_date='2023-02-12T06:00',
    end_date='2023-02-12T18:00',
)
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb -v T2,WIND10 -s 2023-02-12T06:00 -e 2023-02-12T18:00

For WPS export, use the --preset wps flag:

cfdb-ingest wrf /path/to/wrfout/ output.cfdb --preset wps -s 2023-02-10 -e 2023-02-10_06
cfdb-to-int output.cfdb -s 2023-02-10 -e 2023-02-10_06

ERA5

from cfdb_ingest import Era5Ingest

era5 = Era5Ingest('/path/to/era5/*.nc')
era5.convert(
    cfdb_path='era5.cfdb',
    variables=['SP', 'VAR_2T', 'T', 'U', 'V'],
    start_date='2020-01-01',
    end_date='2020-01-31',
)
# Combined: multiple variables in one cfdb
cfdb-ingest era5 /path/to/era5/*.nc output.cfdb -v SP,VAR_2T,T,U,V -s 2020-01-01 -e 2020-01-31

# Split: one cfdb file per variable
cfdb-ingest era5 /path/to/era5/*.nc /output/dir/ --split -v SP,T

See the full documentation for details.

License

This project is licensed under the terms of the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfdb_ingest-0.3.3.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfdb_ingest-0.3.3-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file cfdb_ingest-0.3.3.tar.gz.

File metadata

  • Download URL: cfdb_ingest-0.3.3.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for cfdb_ingest-0.3.3.tar.gz
Algorithm Hash digest
SHA256 80c791616b5d450498057142e84369faaf6bf07ec9be6dd101f0e1124ee5fe9d
MD5 ff24d0d984c97998f95b8c1e69b62371
BLAKE2b-256 6712c7de0988eb759b618a128efbd8a6979c1090b4e35baba3bfcfdaab1b458c

See more details on using hashes here.

File details

Details for the file cfdb_ingest-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for cfdb_ingest-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d71f26e635a5d717c7692f8668e3fad92903c02d67a6312a5f8a3e41e299545a
MD5 f02997cca19faee9eecea2ed5dbf51d9
BLAKE2b-256 a6b07bbcc9da8a7f323b8ba219cb11312682319b6abc79c13dcae65949756ab4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page