Skip to main content

This project automates the fetching and extraction of weather data from multiple sources — such as MSWX, DWD HYRAS, ERA5-Land, NASA-NEX-GDDP, and more — for a given location and time range.

Project description

climdata

image image

climdata is a Python package designed to automate fetching, extraction, and processing of climate data from various sources, including MSWX, DWD HYRAS, ERA5-Land, and NASA-NEX-GDDP. It provides tools to retrieve data for specific locations and time ranges, facilitating climate analysis and research.


Key features

  • Fetch and load datasets: MSWX, CMIP (cloud via intake), DWD, HYRAS
  • Spatial extraction: point, box (region via config bounds), or shapefile (GeoJSON/Feature)
  • Temporal subsetting via config or programmatic call
  • Multi-format export: NetCDF, Zarr, CSV (standardized long format: variable, value, units)
  • Hydra configuration + easy CLI overrides
  • Helper to normalize AOI (GeoJSON → point / bbox / polygon)
  • Provenance-friendly workflow (designed to be used with CI/CD workflows)

Install (development)

  1. Clone repository
git clone <repo-url>
cd climdata
  1. Create virtualenv and install deps
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev]"   # or pip install -r requirements.txt

Quick CLI (Hydra) usage

Hydra reads configs from conf/. Override any config value on the CLI.

Examples:

# Region extraction (saves NetCDF by default when region is used)
python examples/climdata_cli.py dataset=CMIP region=europe time_range.start_date=2010-01-01 time_range.end_date=2010-12-31

# Point extraction (saves CSV)
python examples/climdata_cli.py dataset=MSWX lat=52.5 lon=13.4 variables=['tas','pr'] time_range.start_date=2000-01-01

# HYRAS / DWD (point only)
python examples/climdata_cli.py dataset=HYRAS lat=52 lon=10

Notes:

  • Use dataset=<MSWX|CMIP|DWD|HYRAS> in CLI.
  • Override any config key: e.g. time_range.start_date=2000-01-01.
  • DWD/HYRAS: region (box) extraction is not supported — script will raise an error if attempted.

Programmatic usage

Use the wrapper to compose configs, preprocess AOI, extract, and save.

from climdata.utils.wrapper import extract_data

# returns (cfg, filename, ds, index) when save_to_file=True
cfg, filename, ds, index = extract_data(cfg_name="config", overrides=["dataset=MSWX","lat=52.5","lon=13.4"])

Or use the dataset classes directly:

import climdata, xarray as xr
cmip = climdata.CMIP(cfg)
cmip.fetch()
cmip.load()
cmip.extract(box=cfg.bounds[cfg.region])
cmip.save_netcdf("output.nc")

Configs

  • Config files live in climdata/conf/. There are dataset-specific config entry points e.g. config_cmip, config_mswx, etc.
  • Filename templates are configurable in cfg.output:
    • cfg.output.filename_nc
    • cfg.output.filename_csv
    • cfg.output.filename_zarr

The wrapper generates filenames via get_output_filename(cfg, output_type, ...) using cfg.bounds, cfg.time_range, etc.

Output CSV format

CSV produced by save_csv is standardized to the long form with columns (where available):

  • source_id, experiment_id, table_id, time, lat, lon, variable, value, units

This ensures a single value column and a variable column for stacked variables.

Common issues & tips

  • NetCDF write ValueError (datetime encoding): call ds["time"].encoding.clear() before to_netcdf() (wrapper handles this).
  • PermissionError writing files: ensure output directory is writable or write to /tmp/ (or adjust permissions).
  • CMIP cloud access requires network access — use the Pangeo intake catalog URL already referenced in code.

AOI handling

preprocess_aoi(cfg) accepts:

  • GeoJSON strings / Feature / FeatureCollection
  • Point → sets cfg.lat, cfg.lon
  • Polygon or bbox → sets cfg.bounds['custom'] and cfg.region='custom'

HYRAS support

HYRAS class mirrors MSWX design:

  • fetch() / load() / extract(point=...) / save_csv() / save_netcdf()
  • HYRAS extraction currently supports point extraction; attempt to use a region will raise an error.

Development & provenance

  • CI: add GitHub Actions workflows to run tests and build/publish to PyPI.
  • Keep config and runtime overrides in Hydra to enable reproducible runs.
  • Include CITATION.cff, license, and a changelog for FAIR discoverability.

Contributing

  • Run tests: pytest
  • Style: follow repository linting config
  • Open PRs against main with tests and a short changelog entry

License

Specify the license (e.g. MIT or Apache 2.0) in LICENSE.


For further examples, see examples/ and the docs/ folder (usage, installation, faq).// filepath: /beegfs/muduchuru/pkgs_fnl/climdata/README.md

climdata

Lightweight toolkit to fetch, subset and export climate data (MSWX, CMIP, DWD, HYRAS).
Provides a Hydra-driven CLI, programmatic wrapper, cloud-native CMIP access, local dataset handling, and standardized CSV/NetCDF/Zarr exports.

Key features

  • Fetch and load datasets: MSWX, CMIP (cloud via intake), DWD, HYRAS
  • Spatial extraction: point, box (region via config bounds), or shapefile (GeoJSON/Feature)
  • Temporal subsetting via config or programmatic call
  • Multi-format export: NetCDF, Zarr, CSV (standardized long format: variable, value, units)
  • Hydra configuration + easy CLI overrides
  • Helper to normalize AOI (GeoJSON → point / bbox / polygon)
  • Provenance-friendly workflow (designed to be used with CI/CD workflows)

Install (development)

  1. Clone repository
git clone <repo-url>
cd climdata
  1. Create virtualenv and install deps
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev]"   # or pip install -r requirements.txt

Quick CLI (Hydra) usage

Hydra reads configs from conf/. Override any config value on the CLI.

Examples:

# Region extraction (saves NetCDF by default when region is used)
python examples/climdata_cli.py dataset=CMIP region=europe time_range.start_date=2010-01-01 time_range.end_date=2010-12-31

# Point extraction (saves CSV)
python examples/climdata_cli.py dataset=MSWX lat=52.5 lon=13.4 variables=['tas','pr'] time_range.start_date=2000-01-01

# HYRAS / DWD (point only)
python examples/climdata_cli.py dataset=HYRAS lat=52 lon=10

Notes:

  • Use dataset=<MSWX|CMIP|DWD|HYRAS> in CLI.
  • Override any config key: e.g. time_range.start_date=2000-01-01.
  • DWD/HYRAS: region (box) extraction is not supported — script will raise an error if attempted.

Programmatic usage

Use the wrapper to compose configs, preprocess AOI, extract, and save.

from climdata.utils.wrapper import extract_data

# returns (cfg, filename, ds, index) when save_to_file=True
cfg, filename, ds, index = extract_data(cfg_name="config", overrides=["dataset=MSWX","lat=52.5","lon=13.4"])

Or use the dataset classes directly:

import climdata, xarray as xr
cmip = climdata.CMIP(cfg)
cmip.fetch()
cmip.load()
cmip.extract(box=cfg.bounds[cfg.region])
cmip.save_netcdf("output.nc")

Configs

  • Config files live in climdata/conf/. There are dataset-specific config entry points e.g. config_cmip, config_mswx, etc.
  • Filename templates are configurable in cfg.output:
    • cfg.output.filename_nc
    • cfg.output.filename_csv
    • cfg.output.filename_zarr

The wrapper generates filenames via get_output_filename(cfg, output_type, ...) using cfg.bounds, cfg.time_range, etc.

Output CSV format

CSV produced by save_csv is standardized to the long form with columns (where available):

  • source_id, experiment_id, table_id, time, lat, lon, variable, value, units

This ensures a single value column and a variable column for stacked variables.

Common issues & tips

  • NetCDF write ValueError (datetime encoding): call ds["time"].encoding.clear() before to_netcdf() (wrapper handles this).
  • PermissionError writing files: ensure output directory is writable or write to /tmp/ (or adjust permissions).
  • CMIP cloud access requires network access — use the Pangeo intake catalog URL already referenced in code.

AOI handling

preprocess_aoi(cfg) accepts:

  • GeoJSON strings / Feature / FeatureCollection
  • Point → sets cfg.lat, cfg.lon
  • Polygon or bbox → sets cfg.bounds['custom'] and cfg.region='custom'

HYRAS support

HYRAS class mirrors MSWX design:

  • fetch() / load() / extract(point=...) / save_csv() / save_netcdf()
  • HYRAS extraction currently supports point extraction; attempt to use a region will raise an error.

Development & provenance

  • CI: add GitHub Actions workflows to run tests and build/publish to PyPI.
  • Keep config and runtime overrides in Hydra to enable reproducible runs.
  • Include CITATION.cff, license, and a changelog for FAIR discoverability.

Contributing

  • Run tests: pytest
  • Style: follow repository linting config
  • Open PRs against main with tests and a short changelog entry

License

Specify the license (e.g. MIT or Apache 2.0) in LICENSE.


For further examples, see examples/ and the docs/ folder (usage, installation, faq).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

climdata-0.3.2.tar.gz (263.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

climdata-0.3.2-py2.py3-none-any.whl (39.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file climdata-0.3.2.tar.gz.

File metadata

  • Download URL: climdata-0.3.2.tar.gz
  • Upload date:
  • Size: 263.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for climdata-0.3.2.tar.gz
Algorithm Hash digest
SHA256 1440e75a8e8f1a4ecd6004f12b00610d30f5300c9677cbad88a9a28b468902bf
MD5 69780635cf044bc6d686e2acea06c9e0
BLAKE2b-256 aebe7a00a898c8c20a7b264095f0892552d3844add3e81c2842569048dea0f95

See more details on using hashes here.

File details

Details for the file climdata-0.3.2-py2.py3-none-any.whl.

File metadata

  • Download URL: climdata-0.3.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for climdata-0.3.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0d2f668476c9dcca9e6e4db7b958eac18886532be8cc28673f03d65e1c129779
MD5 d5fff4225a3be978cae1ee8f56ff5dce
BLAKE2b-256 09128b9644bc77f2b670ced9ebe39b2116ce918550c5f51dd1b967aedfcf65a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page