Skip to main content

This project automates the fetching and extraction of weather data from multiple sources — such as MSWX, DWD HYRAS, ERA5-Land, NASA-NEX-GDDP, and more — for a given location and time range.

Project description

climdata

image image

climdata is a Python package designed to automate fetching, extraction, and processing of climate data from various sources, including MSWX, DWD HYRAS, ERA5-Land, and NASA-NEX-GDDP. It provides tools to retrieve data for specific locations and time ranges, facilitating climate analysis and research.


Key features

  • Fetch and load datasets: MSWX, CMIP (cloud via intake), DWD, HYRAS
  • Spatial extraction: point, box (region via config bounds), or shapefile (GeoJSON/Feature)
  • Temporal subsetting via config or programmatic call
  • Multi-format export: NetCDF, Zarr, CSV (standardized long format: variable, value, units)
  • Hydra configuration + easy CLI overrides
  • Helper to normalize AOI (GeoJSON → point / bbox / polygon)
  • Provenance-friendly workflow (designed to be used with CI/CD workflows)

Install (development)

  1. Clone repository
git clone <repo-url>
cd climdata
  1. Create virtualenv and install deps
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev]"   # or pip install -r requirements.txt

Quick CLI (Hydra) usage

Hydra reads configs from conf/. Override any config value on the CLI.

Examples:

# Region extraction (saves NetCDF by default when region is used)
python examples/climdata_cli.py dataset=CMIP region=europe time_range.start_date=2010-01-01 time_range.end_date=2010-12-31

# Point extraction (saves CSV)
python examples/climdata_cli.py dataset=MSWX lat=52.5 lon=13.4 variables=['tas','pr'] time_range.start_date=2000-01-01

# HYRAS / DWD (point only)
python examples/climdata_cli.py dataset=HYRAS lat=52 lon=10

Notes:

  • Use dataset=<MSWX|CMIP|DWD|HYRAS> in CLI.
  • Override any config key: e.g. time_range.start_date=2000-01-01.
  • DWD/HYRAS: region (box) extraction is not supported — script will raise an error if attempted.

Programmatic usage

Use the wrapper to compose configs, preprocess AOI, extract, and save.

from climdata.utils.wrapper import extract_data

# returns (cfg, filename, ds, index) when save_to_file=True
cfg, filename, ds, index = extract_data(cfg_name="config", overrides=["dataset=MSWX","lat=52.5","lon=13.4"])

Or use the dataset classes directly:

import climdata, xarray as xr
cmip = climdata.CMIP(cfg)
cmip.fetch()
cmip.load()
cmip.extract(box=cfg.bounds[cfg.region])
cmip.save_netcdf("output.nc")

Configs

  • Config files live in climdata/conf/. There are dataset-specific config entry points e.g. config_cmip, config_mswx, etc.
  • Filename templates are configurable in cfg.output:
    • cfg.output.filename_nc
    • cfg.output.filename_csv
    • cfg.output.filename_zarr

The wrapper generates filenames via get_output_filename(cfg, output_type, ...) using cfg.bounds, cfg.time_range, etc.

Output CSV format

CSV produced by save_csv is standardized to the long form with columns (where available):

  • source_id, experiment_id, table_id, time, lat, lon, variable, value, units

This ensures a single value column and a variable column for stacked variables.

Common issues & tips

  • NetCDF write ValueError (datetime encoding): call ds["time"].encoding.clear() before to_netcdf() (wrapper handles this).
  • PermissionError writing files: ensure output directory is writable or write to /tmp/ (or adjust permissions).
  • CMIP cloud access requires network access — use the Pangeo intake catalog URL already referenced in code.

AOI handling

preprocess_aoi(cfg) accepts:

  • GeoJSON strings / Feature / FeatureCollection
  • Point → sets cfg.lat, cfg.lon
  • Polygon or bbox → sets cfg.bounds['custom'] and cfg.region='custom'

HYRAS support

HYRAS class mirrors MSWX design:

  • fetch() / load() / extract(point=...) / save_csv() / save_netcdf()
  • HYRAS extraction currently supports point extraction; attempt to use a region will raise an error.

Development & provenance

  • CI: add GitHub Actions workflows to run tests and build/publish to PyPI.
  • Keep config and runtime overrides in Hydra to enable reproducible runs.
  • Include CITATION.cff, license, and a changelog for FAIR discoverability.

Contributing

  • Run tests: pytest
  • Style: follow repository linting config
  • Open PRs against main with tests and a short changelog entry

License

Specify the license (e.g. MIT or Apache 2.0) in LICENSE.


For further examples, see examples/ and the docs/ folder (usage, installation, faq).// filepath: /beegfs/muduchuru/pkgs_fnl/climdata/README.md

climdata

Lightweight toolkit to fetch, subset and export climate data (MSWX, CMIP, DWD, HYRAS).
Provides a Hydra-driven CLI, programmatic wrapper, cloud-native CMIP access, local dataset handling, and standardized CSV/NetCDF/Zarr exports.

Key features

  • Fetch and load datasets: MSWX, CMIP (cloud via intake), DWD, HYRAS
  • Spatial extraction: point, box (region via config bounds), or shapefile (GeoJSON/Feature)
  • Temporal subsetting via config or programmatic call
  • Multi-format export: NetCDF, Zarr, CSV (standardized long format: variable, value, units)
  • Hydra configuration + easy CLI overrides
  • Helper to normalize AOI (GeoJSON → point / bbox / polygon)
  • Provenance-friendly workflow (designed to be used with CI/CD workflows)

Install (development)

  1. Clone repository
git clone <repo-url>
cd climdata
  1. Create virtualenv and install deps
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev]"   # or pip install -r requirements.txt

Quick CLI (Hydra) usage

Hydra reads configs from conf/. Override any config value on the CLI.

Examples:

# Region extraction (saves NetCDF by default when region is used)
python examples/climdata_cli.py dataset=CMIP region=europe time_range.start_date=2010-01-01 time_range.end_date=2010-12-31

# Point extraction (saves CSV)
python examples/climdata_cli.py dataset=MSWX lat=52.5 lon=13.4 variables=['tas','pr'] time_range.start_date=2000-01-01

# HYRAS / DWD (point only)
python examples/climdata_cli.py dataset=HYRAS lat=52 lon=10

Notes:

  • Use dataset=<MSWX|CMIP|DWD|HYRAS> in CLI.
  • Override any config key: e.g. time_range.start_date=2000-01-01.
  • DWD/HYRAS: region (box) extraction is not supported — script will raise an error if attempted.

Programmatic usage

Use the wrapper to compose configs, preprocess AOI, extract, and save.

from climdata.utils.wrapper import extract_data

# returns (cfg, filename, ds, index) when save_to_file=True
cfg, filename, ds, index = extract_data(cfg_name="config", overrides=["dataset=MSWX","lat=52.5","lon=13.4"])

Or use the dataset classes directly:

import climdata, xarray as xr
cmip = climdata.CMIP(cfg)
cmip.fetch()
cmip.load()
cmip.extract(box=cfg.bounds[cfg.region])
cmip.save_netcdf("output.nc")

Configs

  • Config files live in climdata/conf/. There are dataset-specific config entry points e.g. config_cmip, config_mswx, etc.
  • Filename templates are configurable in cfg.output:
    • cfg.output.filename_nc
    • cfg.output.filename_csv
    • cfg.output.filename_zarr

The wrapper generates filenames via get_output_filename(cfg, output_type, ...) using cfg.bounds, cfg.time_range, etc.

Output CSV format

CSV produced by save_csv is standardized to the long form with columns (where available):

  • source_id, experiment_id, table_id, time, lat, lon, variable, value, units

This ensures a single value column and a variable column for stacked variables.

Common issues & tips

  • NetCDF write ValueError (datetime encoding): call ds["time"].encoding.clear() before to_netcdf() (wrapper handles this).
  • PermissionError writing files: ensure output directory is writable or write to /tmp/ (or adjust permissions).
  • CMIP cloud access requires network access — use the Pangeo intake catalog URL already referenced in code.

AOI handling

preprocess_aoi(cfg) accepts:

  • GeoJSON strings / Feature / FeatureCollection
  • Point → sets cfg.lat, cfg.lon
  • Polygon or bbox → sets cfg.bounds['custom'] and cfg.region='custom'

HYRAS support

HYRAS class mirrors MSWX design:

  • fetch() / load() / extract(point=...) / save_csv() / save_netcdf()
  • HYRAS extraction currently supports point extraction; attempt to use a region will raise an error.

Development & provenance

  • CI: add GitHub Actions workflows to run tests and build/publish to PyPI.
  • Keep config and runtime overrides in Hydra to enable reproducible runs.
  • Include CITATION.cff, license, and a changelog for FAIR discoverability.

Contributing

  • Run tests: pytest
  • Style: follow repository linting config
  • Open PRs against main with tests and a short changelog entry

License

Specify the license (e.g. MIT or Apache 2.0) in LICENSE.


For further examples, see examples/ and the docs/ folder (usage, installation, faq).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

climdata-0.3.5.tar.gz (807.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

climdata-0.3.5-py2.py3-none-any.whl (42.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file climdata-0.3.5.tar.gz.

File metadata

  • Download URL: climdata-0.3.5.tar.gz
  • Upload date:
  • Size: 807.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for climdata-0.3.5.tar.gz
Algorithm Hash digest
SHA256 e9b453ab13deff7a193a232c688140844e0df51980c9cd3f16f8af8528e6fd7b
MD5 c8cfc5097c38c5847db9cf07c9150079
BLAKE2b-256 dc5b2f9e5e83b1097ed7c0c8dbbf37c2a27e6c4f45afae941ac6f8ab873a9b02

See more details on using hashes here.

File details

Details for the file climdata-0.3.5-py2.py3-none-any.whl.

File metadata

  • Download URL: climdata-0.3.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for climdata-0.3.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2d769765ed96fcb13f4c215d72675ffb05707a7115a9f6756ea13ea65d5e2c5b
MD5 f5129107e6294edaaf5864bffec8d5bc
BLAKE2b-256 cd3476dad690c80375a920aef842c3dca251b0a2fd6bb5386585172aa7fe961d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page