Climate Model vs. Observation Comparison Tool
Project description
EvalMetrics: Precision in Prediction
OVERVIEW
This tool, "Model vs. Observation Verification (Per-Pixel Metrics)", is a Python-based script that:
- Loads two NetCDF datasets (model vs. observation).
- Handles large data with Dask for out-of-core computation.
- Allows optional GPU usage (if CuPy and a compatible CUDA environment are available).
- Checks for dimension mismatches and can regrid one dataset to the other's grid.
- Computes a variety of domain-wide (aggregate) metrics:
- Continuous metrics (MSE, MAE, Bias, Correlation)
- Event-based metrics (POD, FAR, CSI, ETS, FSS)
- Probabilistic metrics (Brier Score, BSS, etc.)
- Distribution-level metrics (Wasserstein distance, Jensen-Shannon divergence)
- Computes and visualizes "per-pixel" (gridwise) metrics across time for both continuous (Bias, MAE, MSE, Correlation maps) and event-based (POD, FAR, CSI, ETS maps).
- Produces side-by-side plots for model vs. observation fields and also metric maps using cartopy for geographic context.
- Optionally saves the results to a NetCDF file.
You can run this script interactively in a terminal and receive textual output plus pop-up map plots for each chosen metric.
REQUIREMENTS
-
Python 3.7+ recommended
-
An environment with:
- numpy
- xarray
- dask
- distributed
- matplotlib
- cartopy
- netCDF4
- scipy
-
Optional for GPU usage:
- cupy (plus a CUDA-capable NVIDIA GPU with compatible drivers)
Depending on your system, you may install via:
pip install xarray dask distributed cupy netCDF4 scipy cartopy
Or use a conda environment:
conda install xarray dask distributed netcdf4 scipy cartopy
For GPU usage, you'll need to install CuPy and a compatible CUDA environment.
SCRIPT USAGE
- Launch your Python environment (conda, virtualenv, etc.).
- Run: python final_with_vis.py
- Follow the prompts:
- a) "Use GPU acceleration? (y/n)": If you have cupy + CUDA installed, type 'y' to use GPU. Otherwise, default to 'n'.
- b) "Number of Dask workers": For local parallelism. If you have multiple CPU cores or GPUs, you can specify more workers.
- c) "Path to MODEL dataset": Enter the path to the model NetCDF file.
- d) "Path to OBS dataset": Enter the path to the observation NetCDF file.
- e) "Chunk sizes 'time,lat,lon' [default=10,200,200]": If you have very large data or a specific preference for chunking, specify them. Otherwise hit enter to use default.
- f) The script checks if the dimensions match. If not, it will prompt you to regrid one dataset to the other.
- g) It lists the available variables in the model dataset. Select by index or by exact name (the same variable must exist in the obs dataset).
- h) (Optional) You can visualize any time index side-by-side for model vs obs.
- i) Select which domain-wide metrics to compute:
- for continuous metrics
- for event-based metrics
- for probabilistic metrics
- for distribution-level metrics
- j) For event-based, specify a threshold. For example, '1.0' or '0.1'. If the data do not exceed that threshold, you'll see NaNs for hits/misses.
- k) For probabilistic metrics, confirm if the model data are indeed probabilities (0..1). If yes, also specify the threshold to convert obs to 0/1. Otherwise skip.
- l) Finally, decide whether you want per-pixel (gridwise) metrics. If 'y', the script computes and plots bias_map, corr_map, event-based POD/FAR maps, etc.
- m) The script will show a series of cartopy-based pop-up windows or inline plots (depending on your environment) for each chosen metric map.
INTERPRETING THE OUTPUT
-
Domain-wide metrics: Printed in the console. If any computed value is NaN, it indicates some dimension of zero events or no overlapping data.
-
Per-pixel metrics:
- "Bias map" shows how much the model over- or underestimates at each lat-lon, averaged over time.
- "Corr map" reveals correlation over time for each lat-lon cell.
- "POD map" shows Probability of Detection for threshold-based events per pixel.
- "FAR map" is the False Alarm Ratio, etc. If the entire domain or particular grid cells have no events above the threshold, metrics can be NaN.
-
Warnings like "RuntimeWarning: invalid value encountered in divide" typically occur when a denominator is zero (leading to NaN). It's normal if the data never exceed your threshold or if there's no temporal variation for some grid cells.
-
If using Dask, you can monitor the Dask dashboard link (e.g., http://127.0.0.1:8787) to see tasks, memory usage, and parallel processing.
COMMON ISSUES
- "NaN" event-based metrics: Usually means the threshold is not exceeded, or there's no overlap in events.
- "invalid value encountered in divide" warnings: Happens when the script calculates metrics that lead to dividing by zero. The final result for those cells is NaN.
- GPU not recognized: Ensure cupy is installed and your CUDA environment is set up. Otherwise, select CPU usage.
EXTENSIONS & CUSTOMIZATIONS
- Add or remove metrics in the relevant compute functions.
- Implement advanced regridding (like xesmf) if linear interpolation is not sufficient.
- Adjust chunk sizes for performance or memory constraints.
- Save per-pixel metrics as NetCDF for offline or advanced analysis, e.g.: grid_cont.to_netcdf("gridwise_continuous_metrics.nc") grid_evt.to_netcdf("gridwise_event_metrics.nc")
- For real-time usage or a GUI, consider wrapping in a web framework (e.g., Panel, Streamlit).
CREDITS & LICENSE
- Built with xarray, dask, cartopy, netCDF4, cupy, and other open-source Python libraries.
- You may distribute or modify this script as needed. No specific license text is provided here; adapt to your organizational requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file climeval-0.1.0.tar.gz.
File metadata
- Download URL: climeval-0.1.0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4d407257aff312f8a01829113be6d1c700cf40c1127dedeba746347691a5d43
|
|
| MD5 |
c8e9582f54dbcea5f399ba252f59b8b3
|
|
| BLAKE2b-256 |
573bafe4ff2e24d56bec7e10dad327a661da660cba32da3acff722ab30d78143
|
File details
Details for the file climeval-0.1.0-py3-none-any.whl.
File metadata
- Download URL: climeval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8545432582d55a275c56c0e4ef2085f43c9089a71d2946d2bf5f276fe2d15622
|
|
| MD5 |
32910e334cd5e0dc2ff3cb1901306758
|
|
| BLAKE2b-256 |
0a8b95d9f7c5569102c5b4134cf5991f98f451b8934a01e755da74dc8af03cac
|