Skip to main content

File format conversions to cfdb

Project description

cfdb-ingest

Convert meteorological model output to cfdb with standardized CF conventions

build codecov PyPI version


Documentation: https://mullenkamp.github.io/cfdb-ingest/

Source Code: https://github.com/mullenkamp/cfdb-ingest


Table of Contents

Overview

cfdb-ingest converts meteorological file formats (netCDF4/HDF5, GRIB2, etc.) from various model outputs into cfdb. It standardizes variable names and attributes to be consistent with CF conventions, making it straightforward to work with datasets from different sources through a single interface.

Key features:

  • Automatic variable mapping -- source variable names are translated to CF-standard names with proper metadata (standard_name, units, encoding)
  • Wind rotation -- grid-relative wind components are rotated to earth-relative using COSALPHA/SINALPHA
  • 3D level interpolation -- eta-level variables are interpolated to user-specified height levels above ground
  • Variable merging -- surface and level-interpolated variants of the same quantity (e.g. T2 at 2 m and T at arbitrary heights) are merged into a single output variable
  • Spatial and temporal filtering -- subset by bounding box (WGS84) and/or date range before writing
  • Multi-file support -- seamlessly spans multiple input files, including cross-file precipitation accumulation
  • Configurable chunking -- tune output chunk shapes for different access patterns

Supported Formats

Source Class CRS Projections
WRF (wrfout) WrfIngest Lambert Conformal Conic, Polar Stereographic, Mercator, Lat-Lon

WRF Variables

Surface variables (fixed height above ground):

Key cfdb Name Height Source Vars Transform
T2 air_temp 2 m T2 direct
PSFC surface_pressure 0 m PSFC direct
Q2 specific_humidity 2 m Q2 direct
RAIN precip 0 m RAINNC, RAINC accumulation increment
WIND10 wind_speed 10 m U10, V10 wind rotation
WIND_DIR10 wind_direction 10 m U10, V10 wind rotation
U10 u_wind 10 m U10, V10 wind rotation
V10 v_wind 10 m U10, V10 wind rotation
TSK soil_temp 0 m TSK direct
SWDOWN shortwave_radiation 0 m SWDOWN direct
GLW longwave_radiation 0 m GLW direct
SNOWH snow_depth 0 m SNOWH direct
SLP mslp 0 m PSFC, T2, HGT hypsometric reduction

3D level-interpolated variables (interpolated to user-specified target_levels):

Key cfdb Name Source Vars Transform
T air_temp T, P, PB, PH, PHB potential to actual temperature
WIND wind_speed U, V, PH, PHB unstagger + rotation
WIND_DIR wind_direction U, V, PH, PHB unstagger + rotation
U u_wind U, V, PH, PHB unstagger + rotation
V v_wind U, V, PH, PHB unstagger + rotation
Q specific_humidity QVAPOR, PH, PHB mixing ratio to specific humidity

Installation

Requires Python >= 3.11.

pip install cfdb-ingest

Python API

Basic conversion

from cfdb_ingest import WrfIngest

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')

# Convert selected variables for a time window
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'WIND10', 'precip'],
    start_date='2023-02-12T06:00',
    end_date='2023-02-12T18:00',
)

Multi-file input

wrf = WrfIngest([
    'wrfout_d01_2023-02-12_00:00:00.nc',
    'wrfout_d01_2023-02-13_00:00:00.nc',
])

# All timesteps across both files are merged automatically
wrf.convert(cfdb_path='output.cfdb', variables=['T2'])

Spatial subsetting with a bounding box

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2'],
    bbox=(165.0, -47.0, 175.0, -40.0),  # (min_lon, min_lat, max_lon, max_lat)
)

3D level interpolation

# Interpolate 3D temperature and wind to specific heights above ground
wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T', 'WIND'],
    target_levels=[100.0, 500.0, 1000.0, 2000.0],
    bbox=(165.0, -47.0, 175.0, -40.0),
)

Merging surface and 3D variables

Variables sharing a cfdb name are automatically merged. For example, T2 (2 m) and T (levels) both map to air_temp and produce a single output variable spanning all heights:

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2', 'T'],
    target_levels=[100.0, 500.0],
)
# Output height coordinate: [2.0, 100.0, 500.0]

Custom chunk shape

The output chunk shape defaults to (1, 1, ny, nx) (one full spatial slab per timestep per height level). Override it for different access patterns:

wrf.convert(
    cfdb_path='output.cfdb',
    variables=['T2'],
    chunk_shape=(1, 1, 50, 50),  # (time, z, y, x)
)

Inspecting metadata before conversion

wrf = WrfIngest('wrfout_d01_2023-02-12_00:00:00.nc')

wrf.crs                # pyproj.CRS
wrf.times              # numpy datetime64 array
wrf.x, wrf.y           # 1D projected coordinate arrays
wrf.variables          # dict of available variable mappings
wrf.bbox_geographic    # (min_lon, min_lat, max_lon, max_lat)

Variable name resolution

variables accepts mapping keys (T2), source variable names (RAINNC), or cfdb names (air_temp). When a cfdb name maps to multiple keys, all are included:

wrf.resolve_variables(['air_temp'])  # ['T2', 'T']
wrf.resolve_variables(['RAINNC'])    # ['RAIN']
wrf.resolve_variables(None)          # all available keys

CLI

cfdb-ingest provides a cfdb-ingest command with a wrf subcommand.

Basic usage

cfdb-ingest wrf wrfout_d01_2023-02-12_00:00:00.nc output.cfdb \
    -v T2,WIND10 \
    -s 2023-02-12T06:00 \
    -e 2023-02-12T18:00

All options

cfdb-ingest wrf [OPTIONS] INPUT_PATHS... CFDB_PATH
Option Short Description
--variables -v Comma-separated variable names
--start-date -s Start date (ISO format)
--end-date -e End date (ISO format)
--bbox -b Bounding box: min_lon,min_lat,max_lon,max_lat
--target-levels -l Comma-separated height levels in meters
--chunk-shape -c Output chunk shape: time,z,y,x (e.g. 1,1,50,50)
--max-mem Read buffer size in bytes (default: 128 MiB)
--compression Compression algorithm: zstd or lz4

Examples

# Convert with spatial subset
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T2 -b 165.0,-47.0,175.0,-40.0

# 3D temperature at specific height levels
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T -l 100,500,1000,2000 -b 165.0,-47.0,175.0,-40.0

# Custom chunk shape for time-series access patterns
cfdb-ingest wrf wrfout_d01_*.nc output.cfdb \
    -v T2,WIND10 -c 24,1,50,50

Development

Setup environment

We use UV to manage the development environment and production build.

uv sync

Run tests

uv run pytest

Lint and format

uv run lint

License

This project is licensed under the terms of the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfdb_ingest-0.1.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfdb_ingest-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file cfdb_ingest-0.1.0.tar.gz.

File metadata

  • Download URL: cfdb_ingest-0.1.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for cfdb_ingest-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f0bc41ebbf754d89c94b020bda280000b254f3a99f9923afee667c9a8f7d1fa9
MD5 0472801c8ef0b32ffa3f2fc29b0d101f
BLAKE2b-256 0c348dec1ba7a89155fe0452a3b675704bc399307d2ff270759b520356fd7d04

See more details on using hashes here.

File details

Details for the file cfdb_ingest-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cfdb_ingest-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f04dba77f26f18223d50d8972cd07bca154cdaba6d7cd647eb70c09d2e085bd0
MD5 1aae1022b45539a87dff1a1fa3a15ab1
BLAKE2b-256 382530b514093a2161ffa76337af33fc63103e7ba703a62be9e2ac9ad87d25d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page