cavapy

CAVA Python package. Retrive climate data.

These details have not been verified by PyPI

Project links

Repository

Project description

cavapy logo

Python 3.11+

Retrieve, subset, and process CORDEX-CORE and ERA5 climate data directly from THREDDS/OPeNDAP.

Star this project on GitHub

What is cavapy?

Working with CORDEX-CORE climate projections normally means downloading terabytes of raw NetCDF files, reprojecting from rotated polar coordinates to regular lat/lon, writing boilerplate to handle non-Gregorian calendars, converting units, subsetting grids, wrangling multi-model ensembles, and layering bias correction on top. All before you can run a single analysis.

cavapy collapses all of that into one function call.

It streams only the spatial slice you need over OPeNDAP (no local archive required) and returns analysis-ready xarray.DataArray objects with consistent units, a standard Gregorian calendar, and optional bias correction already applied.

It is part of the CAVA (Climate and Agriculture Risk Visualization and Assessment) ecosystem, a joint initiative of FAO, the University of Cantabria, the University of Cape Town, and Predictia.

What gets handled automatically

A single get_climate_data() call orchestrates a full pipeline:

Step	What happens
Inventory lookup	Resolves the correct OPeNDAP URL(s) for your GCM/RCM/RCP/domain combination from a live THREDDS inventory
Spatial subsetting	Streams only the grid cells inside your country or bounding box — no full-file downloads
Country → bbox	Converts a country name to a precise bounding box using Natural Earth shapefiles
Unit conversion	K → °C for temperature; kg m⁻² s⁻¹ → mm/day for precipitation; J/m² → W/m² for solar radiation; 10 m → 2 m for wind speed
Regridding	CORDEX outputs are natively in rotated polar coordinates; the data served here has already been regridded to a regular lat/lon grid, so standard spatial operations work out of the box
Calendar harmonization	Converts 360-day and other non-Gregorian CORDEX calendars to Gregorian, filling gaps with NaN
Parallelization	Variables are fetched in parallel processes; within each process, threaded downloads handle multi-file retrieval
Fault tolerance	OPeNDAP connections retry up to 3 times with backoff; C-level noise is suppressed on intermediate attempts
Bias correction	ERA5 is automatically fetched as the reference; EQM is trained and applied — no external tools needed
Domain validation	If your bounding box falls outside the chosen CORDEX domain, a corrected domain is suggested

Data Coverage

Sources

CORDEX-CORE regional climate simulations (25 km)
ERA5 reanalysis (used directly and as the reference for bias correction)

Data is hosted on the University of Cantabria THREDDS infrastructure.

Available datasets

CORDEX-CORE — original model outputs. Use this when you want raw projections or when you will apply your own post-processing.
CORDEX-CORE-BC — pre-bias-corrected outputs. The full CORDEX-CORE archive was corrected against ERA5 reanalysis using the ISIMIP3 methodology (trend-preserving quantile mapping). Use this dataset when you need a consistent, ready-to-use ensemble with no additional processing.

Available variables

Variable	Description	Units
`tas`	Daily mean temperature	°C
`tasmax`	Daily maximum temperature	°C
`tasmin`	Daily minimum temperature	°C
`pr`	Daily precipitation	mm/day
`hurs`	Daily relative humidity	%
`sfcWind`	Daily wind speed at 2 m	m/s
`rsds`	Daily solar radiation	W/m²

Supported domains and scenario/model options

Domains: NAM-22, EUR-22, AFR-22, EAS-22, SEA-22, WAS-22, AUS-22, SAM-22, CAM-22
RCPs: rcp26, rcp85
GCMs: MOHC, MPI, NCC
RCMs: REMO, Reg

Installation

conda create -n cavapy "python>=3.11"
conda activate cavapy
pip install cavapy

Quick Start

1) Pre-bias-corrected projections (recommended)

Uses CORDEX-CORE-BC: the full CORDEX archive already corrected against ERA5 using the ISIMIP3 methodology. No further correction is applied at download time.

import cavapy

togo = cavapy.get_climate_data(
    country="Togo",
    variables=["tasmax", "pr"],
    cordex_domain="AFR-22",
    rcp="rcp26",
    gcm="MPI",
    rcm="REMO",
    years_up_to=2030,
    dataset="CORDEX-CORE-BC",
)
# Returns: {"tasmax": xr.DataArray, "pr": xr.DataArray}

2) Original CORDEX-CORE with on-the-fly bias correction

When bias_correction=True, cavapy automatically fetches ERA5 for the historical period and applies Empirical Quantile Mapping (EQM) via xsdba. Historical bias correction uses leave-one-out cross-validation to avoid overfitting. Multiplicative scaling is applied for precipitation, wind, and radiation; additive for temperature and humidity. This is useful when you need custom period or region coverage beyond the pre-corrected archive.

import cavapy

togo = cavapy.get_climate_data(
    country="Togo",
    variables=["tasmax", "pr"],
    cordex_domain="AFR-22",
    rcp="rcp26",
    gcm="MPI",
    rcm="REMO",
    years_up_to=2030,
    bias_correction=True,
    dataset="CORDEX-CORE",
)

3) ERA5 observations only

import cavapy

era5 = cavapy.get_climate_data(
    country="Togo",
    variables=["tasmax", "pr"],
    obs=True,
    years_obs=range(1980, 2019),
)

Core Workflows

Projections + historical baseline

Setting historical=True fetches the 1980–2005 historical simulation run and concatenates it with the projection period, giving a continuous time series.

import cavapy

data = cavapy.get_climate_data(
    country="Afghanistan",
    variables=["tasmax", "pr"],
    cordex_domain="WAS-22",
    rcp="rcp85",
    gcm="NCC",
    rcm="REMO",
    years_up_to=2030,
    historical=True,
    dataset="CORDEX-CORE-BC",
)

Multi-model ensemble

Pass lists (or None for all) to rcp, gcm, and rcm. Invalid combinations for the domain are skipped automatically with a warning, rather than raising an error.

import cavapy

multi = cavapy.get_climate_data(
    country="Togo",
    cordex_domain="AFR-22",
    rcp=["rcp26", "rcp85"],
    gcm=["MPI", "MOHC"],
    rcm=["Reg", "REMO"],
    years_up_to=2030,
    historical=True,
    dataset="CORDEX-CORE-BC",
)

The return structure for multi-combination requests is a nested dict:

multi[rcp][f"{gcm}-{rcm}"][variable]  # -> xarray.DataArray

Custom bounding box

import cavapy

data = cavapy.get_climate_data(
    country=None,
    xlim=(30.0, 42.0),
    ylim=(3.0, 15.0),
    cordex_domain="AFR-22",
    rcp="rcp85",
    gcm="MPI",
    rcm="REMO",
    years_up_to=2050,
    buffer=1,  # expand bbox by 1 degree on each side
)

Parallelization

get_climate_data() uses two levels of concurrency:

Single model/scenario: variables are processed in parallel across processes (default: one per variable), with threaded downloads inside each process
Multiple models/scenarios: combo × variable tasks are distributed across a global process pool (default cap: 6 processes); a live progress bar tracks completion
Sequential mode is used when num_processes <= 1 or only one variable is requested

macOS and Windows scripts

On macOS and Windows, Python starts multiprocessing workers with the spawn method. This means each worker imports the script again before running its task. If get_climate_data() is called at the top level of a .py script, that import re-runs the same call while Python is still starting the worker process, which can raise a multiprocessing bootstrapping RuntimeError.

When using multiple variables or multi-model requests in a script on macOS or Windows, put the call behind Python's standard multiprocessing entry-point guard:

import cavapy


def main():
    togo = cavapy.get_climate_data(
        country="Togo",
        variables=["tasmax", "pr"],
        cordex_domain="AFR-22",
        rcp="rcp26",
        gcm="MPI",
        rcm="REMO",
        years_up_to=2030,
        dataset="CORDEX-CORE-BC",
    )
    return togo


if __name__ == "__main__":
    main()

For a quick unguarded script, use num_processes=1 or request a single variable to run sequentially.

Plotting

cavapy includes built-in plotting helpers that work directly on the returned DataArrays.

Spatial map

import cavapy

data = cavapy.get_climate_data(country="Togo", obs=True, years_obs=range(1990, 2011))

fig = cavapy.plot_spatial_map(
    data["tasmax"],
    time_period=(2000, 2010),
    title="Mean Max Temperature 2000-2010",
    cmap="Reds",
)

Spatial temperature map

Time series

fig = cavapy.plot_time_series(
    data["pr"],
    title="Precipitation Time Series - Togo (1990-2011)",
    trend_line=True,
    ylabel="Annual Precipitation (mm)",
    aggregation="sum",
    figsize=(12, 6),
)

Precipitation time series

For advanced visualization and reporting, see CAVAanalytics.

Operational Notes

Check GitHub issues for data server outages or announcements. cavapy fetches these automatically at startup.
Set CAVAPY_NO_ANNOUNCEMENTS=1 to disable startup announcements in scripts or production runs.

Citation and License

License: MIT
Package metadata and build details: pyproject.toml

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

2.0.6

Jun 22, 2026

This version

2.0.5

Jun 19, 2026

2.0.4

Jun 18, 2026

2.0.3

Apr 13, 2026

2.0.2

Mar 24, 2026

2.0.1

Mar 3, 2026

2.0.0

Feb 3, 2026

1.1.0

Nov 21, 2025

0.4.0

Sep 8, 2025

0.3.1

Aug 21, 2025

0.3.0

Jul 29, 2025

0.2.0

Jun 6, 2025

0.1.4

Feb 3, 2025

0.1.3

Jan 16, 2025

0.1.2

Jan 15, 2025

0.1.1

Jan 10, 2025

0.1.0

Dec 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cavapy-2.0.5.tar.gz (28.6 kB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cavapy-2.0.5-py3-none-any.whl (29.0 kB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file cavapy-2.0.5.tar.gz.

File metadata

Download URL: cavapy-2.0.5.tar.gz
Upload date: Jun 19, 2026
Size: 28.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.2 Darwin/25.5.0

File hashes

Hashes for cavapy-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`83a599891bc0be885ff408a4636cab97b0f964259e4fb05fa39c7ae83bff2e2c`
MD5	`b5ce02721fce7e7f1d5dd4293d3876fa`
BLAKE2b-256	`13af2a6f66bf223152ee7f503744458705986b5f872b51878f2f3dbe657066b9`

See more details on using hashes here.

File details

Details for the file cavapy-2.0.5-py3-none-any.whl.

File metadata

Download URL: cavapy-2.0.5-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 29.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.2 Darwin/25.5.0

File hashes

Hashes for cavapy-2.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`71270eadb0e3bb8dd24b120d61f5e79603230c6e301dbf73934b1620136e7316`
MD5	`adf02ccd1fc69f5a3a87fb0d7474ed88`
BLAKE2b-256	`bf8f7229baebffcd38c03a45d635cfef619c0d89f77e3870673c12defa7c46a2`

See more details on using hashes here.

cavapy 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is cavapy?

What gets handled automatically

Data Coverage

Sources

Available datasets

Available variables

Supported domains and scenario/model options

Installation

Quick Start

1) Pre-bias-corrected projections (recommended)

2) Original CORDEX-CORE with on-the-fly bias correction

3) ERA5 observations only

Core Workflows

Projections + historical baseline

Multi-model ensemble

Custom bounding box

Parallelization

macOS and Windows scripts

Plotting

Spatial map

Time series

Operational Notes

Citation and License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes