Open Source Land Surface Model Benchmarking System

These details have not been verified by PyPI

Project links

Project description

OpenBench

Open Source Land Surface Model Benchmarking System

OpenBench is a fully automated, cross-platform framework for benchmarking land surface models (LSMs) against curated reference datasets. From a single YAML config it preprocesses model output and references onto a common grid/time window, computes a consistent metric and score suite, renders publication-ready comparison figures, and assembles an HTML/PDF report — for any number of variables, models, and references at once.

pip install colm-openbench
openbench smoke-test   # verify the installed package with bundled sample data
openbench init           # generate openbench.yaml interactively
openbench check openbench.yaml   # validate config + data availability
openbench run   openbench.yaml   # evaluate, plot, report

Why OpenBench
Key Features
Installation
Quick Start
How It Works
Configuration
CLI Reference
Metrics & Scores
Comparison & Visualization
Classification Group-By (IGBP / PFT / Köppen)
Reference Data Management
Model Profiles
Output Structure
Performance Tuning
Migrating from Earlier Versions
Requirements
Citation
Links
License

Why OpenBench

Evaluating an LSM normally means hand-wiring regridding, time alignment, masking, metric code, and plotting for every model/reference pair — and redoing it when a file changes. OpenBench turns that into one declarative config:

One config, many comparisons. Variables × models × references are evaluated as a full cartesian product, each pair kept in its own output so nothing clobbers anything else.
Consistent science. The same regridding, masking, weighting, and metric definitions apply to every pair, so cross-model comparisons are fair by construction (cumulative unified masking enforces identical spatial coverage across models).
Grid and station references are both first-class.
Incremental & resumable. Results are cached by a content fingerprint of the config, code, and input files; an interrupted run safely resumes without recomputing finished work or producing partial summaries.
Reproducible. Resolved configs, content fingerprints, and resumable caches make runs auditable and repeatable.

Key Features

Unified configuration — a single openbench.yaml replaces the 4–6 file setup of earlier versions, with _defaults merging and !include splitting.
Reference data registry — 101 built-in reference datasets (grid and station) you can list, inspect, scan, and register.
Model profiles — 22 built-in profiles (CoLM2024, CLM5, NoahMP5, JULES7, VIC5, ERA5-Land, GLDAS2, MATSIRO, CaMa-Flood, WRF, …) supplying variable name mappings and unit conversions automatically.
Rich science suite — 25+ point-wise metrics, 8 normalized skill scores, and 17 statistical analysis modules.
Comparison & visualization — Taylor, Target, Portrait, heat-map, parallel coordinates, KDE, ridgeline, and more — generated automatically.
Classification group-by — per-class aggregation by IGBP land cover, PFT, or Köppen climate zone; the masks ship with the package, so it works out of the box.
CLI-first — run, check, init, ref, sim, model, migrate, cache, smoke-test, gui.
GUI wizard — interactive configuration and run management (optional [gui]).
Remote execution — SSH-based job submission to HPC clusters (optional [remote]).
Migration tool — convert existing JSON / Fortran NML configs to YAML.

Installation

Recommended clean environment:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
python -m pip install --upgrade pip

Install from PyPI:

# Default install — fully featured (CLI, statistics, plotting, HTML reports,
# legacy migration). Best for HPC/servers.
pip install colm-openbench

# Optional: graphical configuration wizard (PySide6/Qt)
pip install "colm-openbench[gui]"

# Optional: remote SSH execution (paramiko)
pip install "colm-openbench[remote]"

# Optional: PDF reports (xhtml2pdf; builds pycairo, needs system cairo)
pip install "colm-openbench[report]"

# Optional: all of the above
pip install "colm-openbench[all]"

Using uv:

uv pip install "colm-openbench[all]"

conda / conda-forge

OpenBench ships a conda-forge recipe (conda/meta.yaml, noarch: sourced from PyPI, with Cartopy/NetCDF/PROJ resolved by conda). Once published to conda-forge, a single command installs everything (no local compilation):

conda install -c conda-forge colm-openbench
mamba install -c conda-forge colm-openbench   # or the faster mamba

Note: the current release is a Beta pre-release (3.0.0b1) and is not on conda-forge yet. Until it is, use one of the two paths below.

If your platform does not have wheels for geospatial dependencies such as Cartopy, install the scientific stack from conda-forge first, then install OpenBench with pip:

mamba create -n openbench -c conda-forge python=3.12 cartopy netcdf4
mamba activate openbench
pip install colm-openbench

Or build a local conda package straight from the bundled recipe:

conda install -n base conda-build
conda build conda/
conda install -c local colm-openbench

Install from a GitHub checkout for development or local testing:

git clone https://github.com/CoLM-SYSU/OpenBench.git
cd OpenBench
python -m pip install -e ".[dev]"

The IGBP / PFT / Köppen classification masks needed for group-by analysis are bundled in the package (compressed, ~1.1 MB total) — no extra download.

Verify an installation before preparing your own data:

openbench smoke-test

This unpacks a bundled Initial_test fixture and runs openbench check against the original grid and station cases: three variables, two simulation cases, and their paired reference sources. Use openbench smoke-test --run for a full evaluation smoke run; the full run may let Cartopy download Natural Earth map assets on a clean machine.

Quick Start

# 1. Generate a config interactively (picks variables, references, models)
openbench init

# 2. Validate the config and check reference data availability
openbench check openbench.yaml

# 3. Run the evaluation (metrics + scores + comparison figures + report)
openbench run openbench.yaml

# Re-run after editing the config — only what changed is recomputed
openbench run openbench.yaml

# Force a full re-run, ignoring the incremental cache
openbench run openbench.yaml --force

Results land in output/<project_name>/; start with reports/report.html.

How It Works

config.yaml
   │
   ▼
┌─────────────┐   ┌──────────────┐   ┌───────────────┐   ┌────────────┐
│ Preprocess  │ → │  Evaluate    │ → │  Compare      │ → │  Report    │
│ regrid /    │   │  metrics &   │   │  cross-model  │   │  HTML/PDF  │
│ time-align /│   │  scores per  │   │  figures &    │   │            │
│ unified mask│   │  (var,sim,ref)│   │  statistics   │   │            │
└─────────────┘   └──────────────┘   └───────────────┘   └────────────┘
        every stage is incrementally cached and resumable

Preprocess — each model and reference is regridded to a common resolution, clipped to the shared time window, and (optionally) masked so all models share identical valid coverage.
Evaluate — for every (variable, simulation, reference) triple, OpenBench computes the configured metrics and scores (grid → per-cell NetCDF, station → per-station CSV).
Compare — when comparison.enabled, results across all pairs feed shared figures (Taylor, portrait, heat-map, …) and optional statistics.
Report — everything is assembled into an HTML report (PDF with the [report] extra).

Configuration

A minimal config:

# openbench.yaml
project:
  name: my_run
  output_dir: ./output
  years: [2010, 2014]
  lat_range: [-90, 90]
  lon_range: [-180, 180]

evaluation:
  variables:
    - Evapotranspiration
    - Latent_Heat

reference:
  Evapotranspiration: ET_Xu_etal_2025_LowRes
  Latent_Heat: FLUXCOM

simulation:
  CoLM2024_test:
    model: CoLM2024            # matches a built-in profile → auto var mapping
    root_dir: /data/simulations/colm2024

A fuller config showing per-simulation overrides, metric/score subsets, and comparison/statistics toggles:

project:
  name: multi-model
  output_dir: ./output
  years: [2000, 2020]
  grid_res: 0.5
  num_cores: 16
  time_alignment: per_pair      # intersection | per_pair | strict
  unified_mask: true            # identical valid coverage across models
  weight: area                  # area (cos-lat) | mass | none

evaluation:
  variables: [Evapotranspiration, Gross_Primary_Productivity, Latent_Heat]

reference:
  Evapotranspiration: ET_Xu_etal_2025_LowRes
  Gross_Primary_Productivity: FLUXCOM-X-BASE_LowRes
  Latent_Heat: FLUXCOM

simulation:
  CoLM2024:
    model: CoLM2024
    root_dir: /data/CoLM2024
    variables:
      Gross_Primary_Productivity:
        varname: GPP            # override the profile's file/variable name
  CLM5:
    model: CLM5
    root_dir: /data/CLM5
    variables:
      Gross_Primary_Productivity:
        varname: FPSN

metrics: [bias, RMSE, correlation, KGE]
scores:  [nBiasScore, nRMSEScore, Overall_Score]

comparison:
  enabled: true

statistics:
  enabled: true
  items: [Z_Score, ANOVA]

Key `project` options

Option	Values / default	Meaning
`years`	`[start, end]`	Evaluation time window
`lat_range` / `lon_range`	`[min, max]`	Spatial domain
`grid_res`	float (degrees)	Common grid the data is regridded to
`num_cores`	int / `null`	Parallel workers (`null`/`0` → all CPUs)
`time_alignment`	`intersection` (default) / `per_pair` / `strict`	How model & reference time windows are matched
`regrid_backend`	`openbench_conservative` (default) / `cdo_remapcon` / `xesmf_conservative` / `basic_interpolation`	Regridding engine
`unified_mask`	`true` (default)	Cumulative NaN mask across sims for cross-model fairness
`weight`	`area` (default when omitted) / `mass` / `none`	Spatial aggregation weighting
`IGBP_groupby` / `PFT_groupby` / `climate_zone_groupby`	`false`	Per-class aggregation (see below)
`generate_report`	`true`	Emit HTML/PDF report

Advanced config helpers

_defaults block — merge shared settings into every simulation entry.
!include path/to/partial.yaml — split large configs across files.
Environment variables override YAML at runtime (handy for quick experiments) — see Performance Tuning.

CLI Reference

Run openbench <command> --help for full options on any command.

Top-level commands

Command	Description
`openbench run CONFIG`	Run evaluation, comparison, and report from a YAML config
`openbench check CONFIG`	Validate config and check reference/data availability
`openbench smoke-test`	Validate the installed package with bundled `Initial_test` sample data
`openbench init`	Interactively generate an `openbench.yaml`
`openbench ref ...`	Manage the reference dataset registry (see below)
`openbench model ...`	Manage model profiles (see below)
`openbench sim scan DIR`	Scan directories for simulation cases → config fragments
`openbench cache ...`	Inspect/clear runtime caches (see below)
`openbench migrate OLD`	Convert a JSON or Fortran NML config to YAML
`openbench gui`	Launch the GUI wizard (requires `[gui]` extra)
`openbench version` / `--version`	Print version

Useful run flags: --force (ignore cache), --comparison-only (re-run only comparisons on existing outputs), --dry-run, --cores N, --variable VAR (repeatable), --output-dir DIR, --dump-config.

`openbench ref` — reference registry

Subcommand	Description
`ref list`	List all available reference datasets
`ref show NAME`	Show details of a dataset
`ref scan DIR`	Scan a directory and auto-register datasets you already have
`ref register NAME`	Register or update a reference dataset
`ref register-profile`	Register a reference profile (variable mappings)
`ref path NAME`	Print the local path for a dataset
`ref optimize`	Convert a dataset to Zarr for faster reads
`ref generate-station-list`	Auto-generate a station-list CSV from NetCDF files
`ref delete NAME`	Delete a user reference entry/overlay
`ref convert-old`	Convert an old-format reference YAML to v3

`openbench model` — model profiles

Subcommand	Description
`model list`	List all available model profiles
`model show NAME`	Show variable mappings for a profile
`model status`	Show model registry status
`model register` / `rename` / `delete`	Manage user profiles
`model export` / `import`	Move a profile in/out as standalone YAML
`model alias`	List or create user model aliases
`model remove-var`	Remove a variable from a profile
`model validate`	Check a profile for completeness issues
`model path NAME`	Show whether a profile is bundled or user-defined

`openbench cache` — runtime caches

Subcommand	Description
`cache status`	Show cache status (e.g. `--regrid` for the regrid-weight cache)
`cache clear`	Clear selected cache files

Metrics & Scores

OpenBench computes 25+ point-wise metrics per (variable, simulation, reference) pair. By default all applicable metrics are produced; restrict with the top-level metrics: list.

Metric	Range	Meaning
`bias`, `percent_bias`	(−∞, ∞)	Mean (relative) difference; 0 = unbiased
`RMSE`	[0, ∞)	Root-mean-square error
`ubRMSE` / `CRMSD`	[0, ∞)	Bias-removed (centered) RMSE
`mean_absolute_error`	[0, ∞)	Mean absolute difference
`correlation`	[−1, 1]	Pearson correlation
`correlation_R2`	[0, 1]	Explained-variance fraction
`NSE`	(−∞, 1]	Nash–Sutcliffe efficiency
`KGE`, `KGESS`	(−∞, 1]	Kling–Gupta efficiency / skill score
`index_agreement`	[0, 1]	Willmott index of agreement
`rv`	[0, ∞)	Variance ratio σ_sim / σ_obs
`kappa_coeff`	[−1, 1]	Categorical agreement

8 normalized skill scores rescale metrics onto a comparable 0–1 axis (1 = best): nBiasScore, nRMSEScore, nPhaseScore, nIavScore, nSpatialScore, nSeasonalityScore, index_agreement, and the combined Overall_Score that drives portrait plots and rankings.

Beyond metrics/scores, the statistics block exposes 17 analysis modules (correlation, ANOVA, Mann–Kendall trend, three-cornered hat, Hellinger distance, partial least squares, false discovery rate, …).

Comparison & Visualization

When comparison.enabled: true, OpenBench renders cross-model figures automatically (PNG, with the underlying data saved alongside):

Taylor diagram & Target diagram — correlation/variance/CRMSD at a glance
Portrait plot (seasonal) — model × variable score grid
Heat map & land-cover-based heat map
Parallel coordinates, kernel density estimate, ridgeline, whisker plots
Difference plots, relative score, radar map
Single Model Performance Index (SMPI), Mann–Kendall trend, three-cornered hat, functional response, PLSR

Classification Group-By (IGBP / PFT / Köppen)

OpenBench can aggregate scores per land-cover or climate class. Enable any of:

project:
  IGBP_groupby: true          # MODIS/IGBP land cover (17 classes)
  PFT_groupby: true           # Plant Functional Type (16 classes)
  climate_zone_groupby: true  # Köppen climate zone (30 classes)

The classification masks (IGBP.nc, PFT.nc, Climate_zone.nc) are bundled with the package (compressed integer masks, ~1.1 MB total), so group-by works out of the box with no extra download. To use your own / higher-resolution masks, override the bundled versions via either:

export OPENBENCH_DATASET_DIR=/path/to/your/dataset   # highest priority
# or place files in ./dataset/<name>.nc relative to where you run openbench

Per-class results appear under output/<run>/comparisons/{IGBP,PFT,Climate_zone}_groupby/.

Reference Data Management

# Browse all 101 reference datasets
openbench ref list

# Inspect a single dataset (a base name resolves to its resolution family)
openbench ref show GLEAM_v4.2a

# Scan a directory and auto-register reference datasets you already have
openbench ref scan /Volumes/work/Reference

Example openbench ref list output:

Name                           Category     Type   Res      Years          Variables
────────────────────────────────────────────────────────────────────────────────────
AH4GUC_LowRes                  Urban        grid   0.5°     2010-2010      1
CERES_EBAF_Ed4.2_LowRes        Energy       grid   0.5°     ...            7
CLARA_3_LowRes                 Energy       grid   0.5°     1979-2024      8
CN05.1_MidRes                  Meteorology  grid   0.25°    1961-2021      2
Caravan_Daily                  Water        stn    stn      1950-2023      1
...                            ...          ...    ...      ...            ...
Total: 101 datasets

Many datasets come in multiple resolutions (_LowRes, _MidRes); a base name like GLEAM_v4.2a resolves to its family, or you can pin a specific one such as ET_Xu_etal_2025_LowRes.

Model Profiles

openbench model list            # list built-in profiles
openbench model show CoLM2024   # variable mappings for a profile

Example openbench model list output:

Name                 Type   Res      Variables Complete
────────────────────────────────────────────────────────────
CoLM2024             grid   0.5°     52        ✓
CLM5                 grid   1.25°    69        ⚠ 16
ERA5-Land            grid   0.1°     12        ✓
GLDAS                grid   0.25°    11        ✓
NoahMP5              grid   ...      70        ⚠ 34
...
Total: 22 model profiles

When simulation.<name>.model matches a known profile, variable file-name patterns and unit conversions are applied automatically. Fields in the simulation entry override the profile.

Output Structure

output/<project_name>/
├── config.yaml          # the fully-resolved config actually used
├── run.log              # complete run log
├── data/                # preprocessed (regridded, clipped) sim/ref NetCDF
├── metrics/             # per-(var,sim,ref) metrics — grid: NetCDF, station: CSV
├── scores/              # per-(var,sim,ref) normalized scores
├── figures/             # per-pair visualizations (PNG)
├── comparisons/         # cross-model figures + group-by results (if enabled)
├── statistics/          # statistical analyses (if enabled)
├── reports/
│   ├── report.html
│   └── report.pdf       # only with the [report] extra
└── .openbench_cache.json # incremental cache index

Grid evaluations write one NetCDF per metric (<var>_ref_<R>_sim_<S>_<metric>.nc, dims (lat, lon) or (time, lat, lon)); station evaluations write an aggregated CSV (<var>_stn_<R>_<S>_evaluations.csv, one row per station). Read either directly with xarray / pandas for downstream analysis.

Performance Tuning

For high-resolution grids or many simulation files, tune IO and Dask in openbench.yaml:

project:
  num_cores: 16

  io:
    # true enables zlib for numeric NetCDF outputs; level 1 is the fast default.
    netcdf_compression: true
    netcdf_compression_level: 1

    # auto/null: use the resource planner; 0: disable; N: force N files per batch.
    mfdataset_batch_size: auto
    mfdataset_auto_batch_min_files: 200
    mfdataset_auto_batch_min_size_mb: 1024
    mfdataset_auto_batch_max_size: 100
    mfdataset_auto_batch_memory_fraction: 0.25

  dask:
    enabled: true
    n_workers: 4
    threads_per_worker: 1
    processes: true
    memory_limit: auto

Environment variables override YAML and take precedence, which is useful for temporary benchmarking:

OPENBENCH_NETCDF_COMPRESSION=1 OPENBENCH_NETCDF_COMP_LEVEL=1 \
OPENBENCH_MFDATASET_BATCH_SIZE=50 \
OPENBENCH_DASK=1 OPENBENCH_DASK_N_WORKERS=4 \
openbench run openbench.yaml --force

For repeatable measurements, run the same config from a clean output directory and keep the resolved config, run log, and environment metadata with the output:

openbench run openbench.yaml --force --dump-config

Migrating from Earlier Versions

# Convert an existing JSON config
openbench migrate old_config.json -o openbench.yaml

# Convert a Fortran NML config (requires the [migration] extra)
openbench migrate main.nml -o openbench.yaml

The v3.0 migration path is designed to preserve the intended evaluation setup while making defaults and generated YAML explicit. For publication comparisons, archive the resolved config.yaml and rerun a small overlap case before mixing old and new result sets.

Requirements

Python 3.10 or newer
Core dependencies: xarray, numpy, scipy, pandas, netCDF4, matplotlib, cartopy, dask, joblib, flox, PyYAML, Jinja2, click, tqdm, packaging
Optional extras: gui (PySide6), remote (paramiko), report (xhtml2pdf), migration (f90nml)
See pyproject.toml for the full pinned dependency list

Citation

If you use OpenBench in scientific work, please cite:

Wei, Z., Xu, Q., Bai, F., Xu, X., Wei, Z., Dong, W., Liang, H., Wei, N., Lu, X., Li, L., Zhang, S., Yuan, H., Liu, L., and Dai, Y.: OpenBench: a land model evaluation system, Geosci. Model Dev., 18, 6517–6540, https://doi.org/10.5194/gmd-18-6517-2025, 2025.

@article{wei2025openbench,
  author  = {Wei, Z. and Xu, Q. and Bai, F. and Xu, X. and Wei, Z. and Dong, W.
             and Liang, H. and Wei, N. and Lu, X. and Li, L. and Zhang, S.
             and Yuan, H. and Liu, L. and Dai, Y.},
  title   = {{OpenBench}: a land model evaluation system},
  journal = {Geoscientific Model Development},
  volume  = {18},
  pages   = {6517--6540},
  year    = {2025},
  doi     = {10.5194/gmd-18-6517-2025}
}

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.0b6 pre-release

Jun 11, 2026

3.0.0b5 pre-release

Jun 11, 2026

3.0.0b4 pre-release

Jun 11, 2026

This version

3.0.0b3 pre-release

Jun 9, 2026

3.0.0b2 pre-release

Jun 8, 2026

3.0.0b1 pre-release

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colm_openbench-3.0.0b3.tar.gz (11.4 MB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

colm_openbench-3.0.0b3-py3-none-any.whl (11.4 MB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file colm_openbench-3.0.0b3.tar.gz.

File metadata

Download URL: colm_openbench-3.0.0b3.tar.gz
Upload date: Jun 9, 2026
Size: 11.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for colm_openbench-3.0.0b3.tar.gz
Algorithm	Hash digest
SHA256	`e02b4d91cda0ffaefc5eb72be9ccb80ce4982911672c15249bca02e1eaac86e7`
MD5	`223208c93cce8fb3f6c6d4b89b88c69e`
BLAKE2b-256	`1b51c1a95b9cda0a84082e284232937e9c8f4100a538e7e088d73eccab197296`

See more details on using hashes here.

File details

Details for the file colm_openbench-3.0.0b3-py3-none-any.whl.

File metadata

Download URL: colm_openbench-3.0.0b3-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 11.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for colm_openbench-3.0.0b3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c67c08306b0a02be0e78ca12d763b737a7e87715d18c2533c7e50b54d5a6d796`
MD5	`01bff52e4f12fdf63d4793c70cb7ceec`
BLAKE2b-256	`b3f1323afe9d6ccd8259416b875b7a459deaf46f5e3b5f5d4ed4cef5933e1ee1`

See more details on using hashes here.

colm-openbench 3.0.0b3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenBench

Table of Contents

Why OpenBench

Key Features

Installation

conda / conda-forge

Quick Start

How It Works

Configuration

Key project options

Advanced config helpers

CLI Reference

Top-level commands

openbench ref — reference registry

openbench model — model profiles

openbench cache — runtime caches

Metrics & Scores

Comparison & Visualization

Classification Group-By (IGBP / PFT / Köppen)

Reference Data Management

Model Profiles

Output Structure

Performance Tuning

Migrating from Earlier Versions

Requirements

Citation

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Key `project` options

`openbench ref` — reference registry

`openbench model` — model profiles

`openbench cache` — runtime caches