Skip to main content

Community Forcing Service — acquire-and-subset access to meteorological forcing for hydrological models

Project description

CFS — Community Forcing Service

Acquire-and-subset access to meteorological forcing products for hydrological modelling.

Acquiring forcing for a modeling study traditionally means bespoke scripting per product — every product has its own API, native variable names, units, accumulation conventions, and grid, and the accumulation-to-rate conversion is re-implemented (and mis-implemented) in every group's scripts. CFS replaces that with one async interface over 33 products that stops at a canonical, CF-aligned xarray.Dataset — deliberately leaving catchment/HRU remapping and model-specific file formats to modeling frameworks (e.g. SYMFLUENCE).

Documentation: https://darriey.github.io/CFS/

CFS is the third member of the community-data triad alongside CAS (Community Attribute Service) and CSFS (Community Streamflow Service):

Service Data Returns
CAS geospatial attributes (DEM, soil, land cover) harmonized zonal statistics
CSFS streamflow observations harmonized station time series
CFS meteorological forcing canonical, subset xarray.Dataset

The boundary (why CFS stops where it does)

CFS does exactly one job: acquire a forcing product, subset it to a bounding box + time range, harmonize it to a canonical schema, and hand back a lazy xarray.Dataset. That's it.

It deliberately does not:

  • remap to HRUs / sub-basins,
  • write model-specific forcing schemas (SUMMA, FUSE, mizuRoute, …),
  • serialize monthly NetCDF chunks or handle HPC filesystem locking.

Those steps are model- and deployment-specific, so they stay in the consumer (e.g. SYMFLUENCE). Keeping the boundary here is what makes CFS reusable across frameworks rather than a SYMFLUENCE library in disguise.

 upstream store ──▶  subset to bbox+time  ──▶  harmonize to canonical  ──▶  xr.Dataset
   (Zarr/S3/…)        cfs.subset.bbox            cfs.subset.canonical          │
                                                                               ▼
                                              [ consumer: HRU remap + model schema ]

Canonical schema (canonical-v1)

Every connector renames native variables to CF-aligned canonical names and converts to canonical SI units (see cfs/core/vocabulary.py). Precipitation and radiation are always returned as rates (kg m-2 s-1, W m-2), never accumulations — the conversion that most often goes wrong is done once, here. The output contract (names, units, attrs, grid layouts, time conventions) is specified normatively in the canonical-v1 spec.

Install

pip install 'community-forcing-service[climate]'   # xarray, zarr, gcsfs, dask, netcdf4

The distribution is named community-forcing-service (the name cfs is taken on PyPI), but the import package and CLI are still cfs (import cfs). From a checkout:

pip install -e '.[climate]'

Use

cfs providers                    # list registered providers
cfs products                     # list products + canonical variables
cfs fetch \
  -P era5_arco:single_levels \
  -b -114.5,50.7,-114.0,51.1 \
  --start 2015-06-01T00:00 --end 2015-06-01T06:00 \
  -v air_temperature,precipitation_flux

Python:

import cfs

ds, result = cfs.fetch_sync(
    "era5_arco:single_levels",
    bbox=(-114.5, 50.7, -114.0, 51.1),
    time_range=("2015-06-01T00:00", "2015-06-01T06:00"),
    variables=["air_temperature", "precipitation_flux"],
)
# ds: lazy canonical cube;  result: FetchResult provenance/shape metadata

From async code, await cfs.fetch(...) directly. Runtime settings (cache dir, timeouts, guardrails) can be overridden after import with cfs.configure(...). The lower-level discover() / get_connector(slug) / async-context-manager seam stays public — see the Python API guide.

Adding a connector

Subclass BaseForcingConnector (optionally mix in ZarrStoreMixin), implement list_products() and fetch(), declare a VariableMapping table mapping native names → canonical vars + linear unit conversions, and decorate with @register("slug"). discover() finds it automatically.

Providers

33 connectors — 31 live-verified against their upstream stores (19 anonymous + 12 auth-gated, confirmed with real CDS and Earthdata credentials); mswep and em_earth are offline-verified pending access/credentials. Highlights:

products
Global / regional reanalyses ERA5 (ARCO + CDS), ERA5-Land, MERRA-2, CARRA, CERRA, RDRS/CaSR, BARRA-R2, CONUS404, NARR, WFDE5
Analysis / observation grids AORC (+ NWM grid), NLDAS-2, HRRR, NWM operational, Daymet, gridMET, nClimGrid-Daily, GLDAS, FLDAS, E-OBS
Satellite / merged precipitation CHIRPS, CHIRTS, GPM IMERG, PERSIANN-CDR, CMORPH, MSWEP, EM-Earth
Forecasts GFS (deterministic), GEFS (ensemble, member dimension)
Climate projections NEX-GDDP-CMIP6, NA-CORDEX

The full per-provider table — grid type, access protocol, auth, verification status, and the per-provider caveats (rolling archive windows, unverified units, slow OPeNDAP paths, derivation notes) — lives in the provider catalog, with the machine-readable version in inventory/providers.yaml.

CDS connectors need ~/.cdsapirc; Earthdata connectors need EARTHDATA_TOKEN (or ~/.netrc / EARTHDATA_USERNAME+PASSWORD) with the "NASA GESDISC DATA ARCHIVE" app authorized. GFS/GEFS need the forecast extra:

pip install 'community-forcing-service[climate,cds,earthdata,forecast]'

Note that CFS is a passthrough service — every fetch hits the provider's live store, so transient upstream outages (THREDDS restarts, S3 hiccups, CDS queue congestion) can surface as fetch errors independent of CFS itself.

Hardening / robustness

  • Range QC (cfs/qc.py): every fetch samples the harmonized cube against each canonical variable's physical valid_range and reports out-of-range values in FetchResult.warnings — catching unit-conversion bugs (a precip flux of 8.6 instead of 1e-4) before they reach a model. Advisory; never fails a fetch. Toggle with CFS_QC_ENABLED.
  • Fetch guardrails: shared _guard_area (CFS_MAX_AREA_DEG2) and cell-count (CFS_MAX_CELLS_PER_FETCH) checks on the base class refuse accidental continental/decadal pulls; enforced uniformly via _finalize.
  • Reset-aware de-accumulation (cfs/subset/deaccumulate.py): running-total fields (ERA5-Land tp/ssrd/strd) are converted to per-step increments before unit conversion, handling daily resets.

Derived variables

When a provider lacks a canonical field, CFS derives it once, in a tested place (cfs/derive/). Currently: specific humidity from relative humidity (cfs/derive/humidity.py, Bolton 1980 saturation vapour pressure) — used by CARRA/CERRA, which ship 2 m RH rather than specific humidity. Derivation inputs (RH) are consumed, not emitted: they do not appear in the canonical output.

Tests

pytest -m 'not network'    # offline: harmonization + subsetting logic
pytest -m network          # integration: real ERA5 fetch from GCS

Naming note

"CFS" also denotes NOAA's Climate Forecast System (CFSR/CFSv2), itself a forcing product. If a CFSR connector is ever added it must use a disambiguated slug (e.g. cfsr) to avoid collision with the service name.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

community_forcing_service-0.2.0.tar.gz (149.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

community_forcing_service-0.2.0-py3-none-any.whl (154.1 kB view details)

Uploaded Python 3

File details

Details for the file community_forcing_service-0.2.0.tar.gz.

File metadata

File hashes

Hashes for community_forcing_service-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2167b7a0d1cf6c6d4d784d8f305caef1edd84d2e926bf0f3b88741522663ecbe
MD5 d1ef1f6b28b686714ee8feefd0e579f5
BLAKE2b-256 d4db8b35e88063c2edb4f3f2d627c070d7036b2898f90c42ff68dd6d43da70bf

See more details on using hashes here.

File details

Details for the file community_forcing_service-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for community_forcing_service-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 100161a27ffc2e96ba7d102a8df257b1ac9f372ef415b8e6ca6d81d067fd76d6
MD5 f73c01f66a293d2c371325cf3e3debfe
BLAKE2b-256 31be0f0eb72be999fa3707cdc14b1d533368664ee4679ecbb2f0f8b1a05cce6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page