Downloader and data management tools for climate and ocean datasets.
Project description
H2MARE - Geospatial Processing for Climate and Ocean Data
A Python pipeline for downloading and preprocessing multi-source oceanographic and atmospheric data into analysis-ready formats. H2MARE streamlines the acquisition and harmonization of data from major climate and ocean observation services, optimized for large-scale spatiotemporal analysis.
Features
- Multi-source data integration: Download and process data from CMEMS, AVISO, and ERA5.
- Variable grouping: Organize related variables using configurable keys.
- Format conversion: Automated conversion from NetCDF/GRIB to optimized Zarr and Parquet format
- Data compilation: Regrid and interpolate multi-resolution datasets to a common grid
- Point and geometry extraction: Extract time series for specific locations or spatial features
Data Sources
H2MARE supports the following data providers API keys and authentication are required for each:
- CMEMS - Copernicus Marine Service: Satellite and in-situ ocean observations
- AVISO - Archiving, Validation and Interpretation of Satellite Oceanographic data
- CDS-ERA5 - ERA5 hourly atmospheric reanalysis (1940-present)
Hersbach, H., et al. (2023). DOI: 10.24381/cds.adbb2d47
Note: Refer to each provider's documentation for authentication setup before use.
Installation
Prerequisites
- Python >= 3.11
- uv — fast Python package and project manager
- Sufficient disk space for downloaded datasets (varies by region and time range)
Install from PyPI
pip install h2mare
# or
uv add h2mare
Install from source
git clone https://github.com/h2ugoparra/h2mare.git
cd h2mare
uv sync
Configuration
H2MARE requires two configuration files in your working directory before first use.
1. config.yaml
Defines variables, dataset IDs, bounding boxes, and processing parameters. Copy the template from the repository as a starting point and edit it to match your needs.
2. .env
# Path to external or large-capacity storage for processed Zarr files
STORE_DIR=/path/to/your/storage
# CMEMS credentials (required for SST, SSH, MLD, CHL, O2, SEAPODYM)
CMEMS_USERNAME=your_username
CMEMS_PASSWORD=your_password
# AVISO credentials (required for FSLE, Eddies)
AVISO_USERNAME=your_username
AVISO_PASSWORD=your_password
AVISO_FTP_SERVER=ftp-access.aviso.altimetry.fr
ERA5 / CDS credentials are configured separately via the cdsapi client — see the CDS documentation for setup.
Note: Both files must be present in the directory where you run
h2mare. You can also set theH2MARE_ROOTenvironment variable to point to a different directory containing them.
Key variables groups
Edit config.yaml to define variable groups and processing parameters.
Data Flow
- Dowload - Raw NetCDF/GRIB files are fetched from configurated sources and saved at specified time resolution (monthly or yearly) as native-resolution Zarr files.
- Compilation (
h2mare/processing/compiler.py) - Preprocessed data is regridded to a defined spatial/temporal resolution and geographic extent (configured via 'h2ds' key inconfig.yaml) - Extraction (
h2mare/processing/extractor.py) - Point (CSV files) or geometry (SHP files) data extraction from xarray datasets.
Quick Start
# Download and process a single variable for a specific date range
uv run h2mare run sst --start-date 2021-01-01 --end-date 2021-12-31
# Multiple variables at once (space-separated)
uv run h2mare run seapodym mld o2 chl
# Infer missing dates from the existing store and download what's new
uv run h2mare run sst
# Download only (skip Zarr conversion)
uv run h2mare run sst --no-process
# Validate configuration without downloading
uv run h2mare run sst --dry-run
# Process all configured variables
uv run h2mare run
Development
# Run the full test suite
uv run pytest tests/
# Run a single test file
uv run pytest tests/test_zarr_catalog.py -v
# Format code
uv run black h2mare/
uv run isort h2mare/
Built with
| Library | Role |
|---|---|
| xarray | N-dimensional labelled arrays and NetCDF/Zarr I/O |
| zarr | Chunked, compressed array storage |
| dask | Parallel and out-of-core computation |
| polars | Fast DataFrame engine for extracted time series |
| geopandas | Geometry-based spatial extraction |
| copernicusmarine | CMEMS dataset access |
| cdsapi | ERA5 / CDS dataset access |
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.
License
This project is licensed under the MIT License - see the LICENSE file for details.
AI Assistance
Parts of this codebase were developed with the help of Claude (Anthropic).
Acknowledgments
This project was developed under the framework of COSTA project. This project relies on data from Copernicus Marine Service, AVISO, Copernicus Climate Data Store, and NOAA NCEI. We gratefully acknowledge these organizations for providing open access to their datasets.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file h2mare-0.1.1.tar.gz.
File metadata
- Download URL: h2mare-0.1.1.tar.gz
- Upload date:
- Size: 116.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17b407e0f1faf9af38f6a5f69d0797a8dd154ae4fd92535cae37567cc34b860b
|
|
| MD5 |
21f4125d4be680da43f68f88f32fef41
|
|
| BLAKE2b-256 |
1952bd083df4b161f42bc996efc31cea3e100fbfff9e70a9f1abeceebca945b2
|
Provenance
The following attestation bundles were made for h2mare-0.1.1.tar.gz:
Publisher:
release.yml on h2ugoparra/h2mare
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h2mare-0.1.1.tar.gz -
Subject digest:
17b407e0f1faf9af38f6a5f69d0797a8dd154ae4fd92535cae37567cc34b860b - Sigstore transparency entry: 1460678460
- Sigstore integration time:
-
Permalink:
h2ugoparra/h2mare@9e9384c264278cbdd759ee054a9cdfedf603706b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/h2ugoparra
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9e9384c264278cbdd759ee054a9cdfedf603706b -
Trigger Event:
release
-
Statement type:
File details
Details for the file h2mare-0.1.1-py3-none-any.whl.
File metadata
- Download URL: h2mare-0.1.1-py3-none-any.whl
- Upload date:
- Size: 116.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e6b585955700bf412fb051cdeb227675ce6b2ad85513637dc3de763c0a7f530
|
|
| MD5 |
2dc1eddc0a67485b2cafea33f9fb44db
|
|
| BLAKE2b-256 |
32076e724695b697648b6d7b7fa7cf928db9ccf47fd051aeeb7790233e8206c4
|
Provenance
The following attestation bundles were made for h2mare-0.1.1-py3-none-any.whl:
Publisher:
release.yml on h2ugoparra/h2mare
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h2mare-0.1.1-py3-none-any.whl -
Subject digest:
8e6b585955700bf412fb051cdeb227675ce6b2ad85513637dc3de763c0a7f530 - Sigstore transparency entry: 1460678651
- Sigstore integration time:
-
Permalink:
h2ugoparra/h2mare@9e9384c264278cbdd759ee054a9cdfedf603706b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/h2ugoparra
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9e9384c264278cbdd759ee054a9cdfedf603706b -
Trigger Event:
release
-
Statement type: