This project automates the fetching and extraction of weather data from multiple sources — such as MSWX, DWD HYRAS, ERA5-Land, NASA-NEX-GDDP, and more — for a given location and time range.
Project description
Welcome to climdata
ClimData — Quickstart & Overview
ClimData provides a unified interface for extracting climate data from multiple providers (MSWX, CMIP, POWER, DWD, HYRAS), computing extreme indices, and converting results to tabular form. The ClimData (or ClimateExtractor) class is central: it manages configuration, extraction, index computation, and common I/O.
Key features
- Provider-agnostic extraction (point / region / shapefile)
- Unit normalization via xclim
- Compute extreme indices using package indices
- Convert xarray Datasets → long-form pandas DataFrames
- Simple workflow runner for chained actions
Installation
- Create and activate a conda environment:
# create
conda create -n climdata python=3.11 -y
# activate
conda activate climdata
- Install via pip (PyPI, if available) or from source:
# from PyPI
pip install climdata
# or from local source (editable)
git clone <repo-url>
cd climdata
pip install -e .
Install optional extras as needed (e.g., xclim, shapely, hydra, dask):
pip install xarray xclim shapely hydra-core dask "pandas>=1.5"
Optional: Imputation Dependencies
If you need the imputation functionality (gap filling with ML models), install PyTorch and related packages:
# Install PyTorch CPU version (recommended for most users)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install additional ML packages for imputation
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.3.1+cpu.html
pip install pytorch-lightning torchmetrics lightning torchcde reformer-pytorch
pip install tensorflow darts sktime prophet tsfel tsfresh transformers timm
For GPU support, see PyTorch installation guide.
Quick example
from climdata import ClimData # or from climdata.utils.wrapper_workflow import ClimateExtractor
overrides = [
"dataset=mswx",
"lat=52.5",
"lon=13.4",
"time_range.start_date=2014-01-01",
"time_range.end_date=2014-12-31",
"variables=[tasmin,tasmax,pr]",
"data_dir=/path/to/data",
"index=tn10p",
]
# initialize
extractor = ClimData(overrides=overrides)
# extract data (returns xarray.Dataset and updates internal state)
ds = extractor.extract()
# compute index (uses cfg.index)
ds_index = extractor.calc_index(ds)
# convert to long-form dataframe and save
df = extractor.to_dataframe(ds_index)
extractor.to_csv(df, filename="index.csv")
Workflow runner
Use run_workflow for multi-step sequences:
result = extractor.run_workflow(actions=["extract", "calc_index", "to_dataframe", "to_csv"])
WorkflowResult contains produced dataset(s), dataframe(s), and filenames.
Documentation & API
- See API docs under
docs/api/for detailed descriptions of ClimData/ClimateExtractor methods. - Examples and notebooks are under
examples/.
Contributing
- Run tests and lint locally.
- Follow project coding and documentation conventions; submit PRs with tests.
Citation
If you use climdata in your research or projects, please cite it using the following formats:
BibTeX
@software{muduchuru2024climdata,
title={climdata: Automated Climate Data Extraction and Processing},
author={Muduchuru, Kaushik},
year={2024},
version={0.5.0},
url={https://github.com/Kaushikreddym/climdata},
note={Available at https://Kaushikreddym.github.io/climdata}
}
APA
Muduchuru, K. (2024). climdata: Automated climate data extraction and processing (v0.5.0). Retrieved from https://github.com/Kaushikreddym/climdata
Citation.cff Format
Our repository includes a CITATION.cff file. GitHub will automatically show a "Cite this repository" button with ready-to-use citation formats.
DOI (Zenodo)
DOI: https://doi.org/10.5281/zenodo.19554926
Zenodo Record: https://zenodo.org/record/19554926
For archival setup details, see Zenodo & DOI Guide.
License
Refer to the repository LICENSE file for terms.
⚡️ Tip
-
Make sure
yqis installed:brew install yq # macOS # OR pip install yq
-
To see available variables for a specific dataset (for example
mswx), run:python download_location.py --cfg job | yq '.mappings.mswx.variables | keys'
⚙️ Key Features
- Supports multiple weather data providers
- Uses
xarrayfor robust gridded data extraction - Handles curvilinear and rectilinear grids
- Uses a Google Drive Service Account for secure downloads
- Easily reproducible runs using Hydra
⚖️ Data Licensing & Access
MSWX Dataset — Non-Commercial Use Only
MSWX (Multi-Source Weather) is released under the CC BY-NC 4.0 license. This means:
✅ Allowed uses:
- Academic research
- Non-profit scientific studies
- Personal projects
- Government or NGO applications (non-commercial)
❌ Not allowed:
- Commercial use or products
- For-profit services
To access MSWX data:
- Visit https://www.gloh2o.org/mswx/
- Submit a data request for non-commercial use
- Once approved, follow the Google Drive API setup below to configure climdata
⚠️ Important: By using MSWX via climdata, you agree to the CC BY-NC 4.0 license terms. Unauthorized commercial use is prohibited.
📡 Google Drive API Setup
This project uses the Google Drive API with a Service Account to securely download weather data files from a shared Google Drive folder.
Follow these steps to set it up correctly:
✅ 1. Create a Google Cloud Project
- Go to Google Cloud Console.
- Click “Select Project” → “New Project”.
- Enter a project name (e.g.
WeatherDataDownloader). - Click “Create”.
✅ 2. Enable the Google Drive API
- In the left sidebar, go to APIs & Services → Library.
- Search for “Google Drive API”.
- Click it, then click “Enable”.
✅ 3. Create a Service Account
- Go to IAM & Admin → Service Accounts.
- Click “Create Service Account”.
- Enter a name (e.g.
weather-downloader-sa). - Click “Create and Continue”. You can skip assigning roles for read-only Drive access.
- Click “Done” to finish.
✅ 4. Create and Download a JSON Key
- After creating the Service Account, click on its email address to open its details.
- Go to the “Keys” tab.
- Click “Add Key” → “Create new key” → choose
JSON→ click “Create”. - A
.jsonkey file will download automatically. Store it securely!
✅ 5. Store the JSON Key Securely
- Place the downloaded
.jsonkey in the conf folder with the name service.json.
Setup Instructions from ERA5 api
1. CDS API Key Setup
-
Create a free account on the Copernicus Climate Data Store
-
Once logged in, go to your user profile
-
Click on the "Show API key" button
-
Create the file
~/.cdsapircwith the following content:url: https://cds.climate.copernicus.eu/api/v2 key: <your-api-key-here>
-
Make sure the file has the correct permissions:
chmod 600 ~/.cdsapirc
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file climdata-0.6.0.tar.gz.
File metadata
- Download URL: climdata-0.6.0.tar.gz
- Upload date:
- Size: 44.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f6adc0a4df65c2802404f10513a21a93691b00b3cbaaff1c0567aef5b2e7b08
|
|
| MD5 |
883bc2469a7e05ed3af0b16ecd930aee
|
|
| BLAKE2b-256 |
87420888ad2c2719c24b18128dfb1aa23b3d91c9c58f013879c2c325546252d6
|
File details
Details for the file climdata-0.6.0-py2.py3-none-any.whl.
File metadata
- Download URL: climdata-0.6.0-py2.py3-none-any.whl
- Upload date:
- Size: 42.4 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7406b40532d58dc3a1600d4184240277b38dfc63e3a52a3bae498bb5ccd70005
|
|
| MD5 |
f738fc82146d1a6ad10fec068fe6a6b1
|
|
| BLAKE2b-256 |
7120ec480f886d2dd0328944a4e8d2ebbc09cc5f50fe05006eeb0ac6a98af48d
|