Skip to main content

File format conversions to cfdb

Project description

cfdb-ingest

Convert meteorological model output to cfdb with standardized CF conventions

build codecov PyPI version


Documentation: https://mullenkamp.github.io/cfdb-ingest/

Source Code: https://github.com/mullenkamp/cfdb-ingest


Table of Contents

Overview

cfdb-ingest converts meteorological file formats (netCDF4/HDF5, GRIB2, etc.) from various model outputs into cfdb. It standardizes variable names and attributes to be consistent with CF conventions, making it straightforward to work with datasets from different sources through a single interface.

Key features:

  • Automatic variable mapping -- source variable names are translated to CF-standard names with proper metadata (standard_name, units, encoding)
  • Wind rotation -- grid-relative wind components are rotated to earth-relative using COSALPHA/SINALPHA
  • 3D level interpolation -- eta-level variables are interpolated to user-specified height levels above ground
  • Variable merging -- surface and level-interpolated variants of the same quantity (e.g. T2 at 2 m and T at arbitrary heights) are merged into a single output variable
  • Spatial and temporal filtering -- subset by bounding box (WGS84) and/or date range before writing
  • Multi-file support -- seamlessly spans multiple input files, including cross-file precipitation accumulation
  • Configurable chunking -- tune output chunk shapes for different access patterns

Supported Formats

Source Class CRS Projections
WRF (wrfout) WrfIngest Lambert Conformal Conic, Polar Stereographic, Mercator, Lat-Lon

WRF Variables

Surface variables (fixed height above ground):

Key cfdb Name Height Source Vars Transform
T2 air_temp 2 m T2 direct
PSFC surface_pressure 0 m PSFC direct
Q2 specific_humidity 2 m Q2 direct
RAIN precip 0 m RAINNC, RAINC accumulation increment
WIND10 wind_speed 10 m U10, V10 wind rotation
WIND_DIR10 wind_direction 10 m U10, V10 wind rotation
U10 u_wind 10 m U10, V10 wind rotation
V10 v_wind 10 m U10, V10 wind rotation
TSK soil_temp 0 m TSK direct
SWDOWN shortwave_radiation 0 m SWDOWN direct
GLW longwave_radiation 0 m GLW direct
SNOWH snow_depth 0 m SNOWH direct
SLP mslp 0 m PSFC, T2, HGT hypsometric reduction

3D level-interpolated variables (interpolated to user-specified target_levels):

Key cfdb Name Source Vars Transform
T air_temp T, P, PB, PH, PHB potential to actual temperature
WIND wind_speed U, V, PH, PHB unstagger + rotation
WIND_DIR wind_direction U, V, PH, PHB unstagger + rotation
U u_wind U, V, PH, PHB unstagger + rotation
V v_wind U, V, PH, PHB unstagger + rotation
Q specific_humidity QVAPOR, PH, PHB mixing ratio to specific humidity

Installation

Requires Python >= 3.11.

pip install cfdb-ingest

Python API

Basic conversion

from cfdb_ingest import WrfIngest

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')

# Convert selected variables for a time window
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'WIND10', 'precip'],
    start_date='2023-02-12T06:00',
    end_date='2023-02-12T18:00',
)

Multi-file input

wrf = WrfIngest([
    'wrfout_d01_2023-02-12_00:00:00.nc',
    'wrfout_d01_2023-02-13_00:00:00.nc',
])

# All timesteps across both files are merged automatically
wrf.convert(cfdb_path='output.cfdb', variables=['T2'])

Spatial subsetting with a bounding box

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2'],
    bbox=(165.0, -47.0, 175.0, -40.0),  # (min_lon, min_lat, max_lon, max_lat)
)

3D level interpolation

# Interpolate 3D temperature and wind to specific heights above ground
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T', 'WIND'],
    target_levels=[100.0, 500.0, 1000.0, 2000.0],
    bbox=(165.0, -47.0, 175.0, -40.0),
)

Merging surface and 3D variables

Variables sharing a cfdb name are automatically merged. For example, T2 (2 m) and T (levels) both map to air_temp and produce a single output variable spanning all heights:

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'T'],
    target_levels=[100.0, 500.0],
)
# Output height coordinate: [2.0, 100.0, 500.0]

Custom chunk shape

The output chunk shape defaults to (1, 1, ny, nx) (one full spatial slab per timestep per height level). Override it for different access patterns:

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2'],
    chunk_shape=(1, 1, 50, 50),  # (time, z, y, x)
)

Inspecting metadata before conversion

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')

wrf.crs                # pyproj.CRS
wrf.times              # numpy datetime64 array
wrf.x, wrf.y           # 1D projected coordinate arrays
wrf.variables          # dict of available variable mappings
wrf.bbox_geographic    # (min_lon, min_lat, max_lon, max_lat)

Variable name resolution

variables accepts mapping keys (T2), source variable names (RAINNC), or cfdb names (air_temp). When a cfdb name maps to multiple keys, all are included:

wrf.resolve_variables(['air_temp'])  # ['T2', 'T']
wrf.resolve_variables(['RAINNC'])    # ['RAIN']
wrf.resolve_variables(None)          # all available keys

CLI

cfdb-ingest provides a cfdb-ingest command with a wrf subcommand.

Basic usage

cfdb-ingest wrf wrfout_d01_2023-02-12_00:00:00.nc output.cfdb \
    -v T2,WIND10 \
    -s 2023-02-12T06:00 \
    -e 2023-02-12T18:00

All options

cfdb-ingest wrf [OPTIONS] INPUT_PATHS... CFDB_PATH
Option Short Description
--variables -v Comma-separated variable names
--start-date -s Start date (ISO format)
--end-date -e End date (ISO format)
--bbox -b Bounding box: min_lon,min_lat,max_lon,max_lat
--target-levels -l Comma-separated height levels in meters
--chunk-shape -c Output chunk shape: time,z,y,x (e.g. 1,1,50,50)
--max-mem Read buffer size in bytes (default: 128 MiB)
--compression Compression algorithm: zstd or lz4

Examples

# Convert with spatial subset
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T2 -b 165.0,-47.0,175.0,-40.0

# 3D temperature at specific height levels
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T -l 100,500,1000,2000 -b 165.0,-47.0,175.0,-40.0

# Custom chunk shape for time-series access patterns
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T2,WIND10 -c 24,1,50,50

Development

Setup environment

We use UV to manage the development environment and production build.

uv sync

Run tests

uv run pytest

Lint and format

uv run lint

License

This project is licensed under the terms of the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfdb_ingest-0.1.2.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfdb_ingest-0.1.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file cfdb_ingest-0.1.2.tar.gz.

File metadata

  • Download URL: cfdb_ingest-0.1.2.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for cfdb_ingest-0.1.2.tar.gz
Algorithm Hash digest
SHA256 46d788ba1d7d8941490b96358f5eaaa4a031717996a2b74785f5725ea8f7f6a9
MD5 76b507378877d55734ca1cca9b6b66a1
BLAKE2b-256 d3df859791e477017bafcb974ef65679dd2b545ead03b12a67035a9d7a32b4cc

See more details on using hashes here.

File details

Details for the file cfdb_ingest-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for cfdb_ingest-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 34d2292156b0aff15908cfce45733cbfa86ad27d94c7a2a9a70630ecdc812794
MD5 ea598c0d95b1764d5aaa466fc5225af6
BLAKE2b-256 5442cb097dd451382f56d8ac6e65551fde893d2f5c9f97caf3eb1e1215865084

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page