DataLad extension for downloading from the Copernicus Climate Data Store
Project description
DataLad extension for the Copernicus Climate Data Store
What?
A DataLad extension to integrate with the Copernicus Climate Data Store (CDS).
So far this just implements a datalad download-cds command that can be used to fetch data from the CDS
and record this action in a way so that datalad get (or just git annex get) can redo the download in the future.
Why?
This extension enables automated provenance tracking for fetching data from the CDS. In a dataset that retrieves data from the CDS using this extension it will become visible how this data was initially fetched and how it can be retrieved again in the future.
How?
You will first have to create an account with the CDS, if you don't have one already. You can do so here: https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome
Next, you will need to create the "~/.cdsapirc" file as described here: https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key. This file is required since the datalad-cds extension internally uses the cdsapi package and therefore uses its authentication mechanism.
Also, you need to install datalad and the datalad-cds extension. Both can be had through pip.
Now you are ready to use the extension. When you look through the CDS you will notice that for any given dataset you can select a subset of the data using the "Download data" tab. After you do that you can use the "Show API request" button at the bottom to get a short python script that would fetch the chosen subset using the cdsapi. The following is an example of that:
#!/usr/bin/env python
import cdsapi
c = cdsapi.Client()
c.retrieve(
"reanalysis-era5-pressure-levels",
{
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"year": "2008",
"month": "01",
"day": "01",
"time": "12:00",
"format": "grib"
},
"download.grib",
)
To fetch the same data to the same local file using datalad-cds we just need to adapt this a little:
$ datalad download-cds --path download.grib '
{
"dataset": "reanalysis-era5-pressure-levels",
"sub-selection": {
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"year": "2008",
"month": "01",
"day": "01",
"time": "12:00",
"format": "grib"
}
}
'
The local path to save to ("download.grib") becomes the --path argument.
The dataset name ("reanalysis-era5-pressure-levels" in this case) becomes the value of the dataset key in a json object that describes the data to be downloaded.
The sub-selection of the dataset becomes the value of the sub-selection key.
After executing the above datalad download-cds command in a DataLad dataset a file called "download.grib" should be newly created.
This file will have its origin tracked in git-annex (you can see that by running git annex whereis download.grib).
If you now datalad drop the file
and then datalad get it you'll see that git-annex will automatically re-retrieve the file from the CDS
as if it was just another location to get data from.
To see more possible usage options take a look at the help page of the command (datalad download-cds --help)
or the documentation at https://matrss.github.io/datalad-cds/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datalad_cds-0.0.14.tar.gz.
File metadata
- Download URL: datalad_cds-0.0.14.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8921edf1ef3b4177ca03c220187b0d56c1b4e926f77606e181a794773efb9b0
|
|
| MD5 |
7a1de5a5edf6268dcdadea8b9978450d
|
|
| BLAKE2b-256 |
139b8ab3448b4d6789382ad83cf4d0098922e68612b494f28bad9e78b98bc169
|
Provenance
The following attestation bundles were made for datalad_cds-0.0.14.tar.gz:
Publisher:
release.yml on matrss/datalad-cds
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalad_cds-0.0.14.tar.gz -
Subject digest:
c8921edf1ef3b4177ca03c220187b0d56c1b4e926f77606e181a794773efb9b0 - Sigstore transparency entry: 868367724
- Sigstore integration time:
-
Permalink:
matrss/datalad-cds@4b8b764c55f154d19f7481bf3f708dd87ddbce4f -
Branch / Tag:
refs/tags/v0.0.14 - Owner: https://github.com/matrss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4b8b764c55f154d19f7481bf3f708dd87ddbce4f -
Trigger Event:
push
-
Statement type:
File details
Details for the file datalad_cds-0.0.14-py3-none-any.whl.
File metadata
- Download URL: datalad_cds-0.0.14-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
496de51c559b1299d4ded34cd30fef778eb36185eee51991fd10e98179de900f
|
|
| MD5 |
d28e616941ed922ae52a0fe1a294a17b
|
|
| BLAKE2b-256 |
c7df351c754313f5179a8bf230e819398042b02baa8f1a9effb5f5131ed75ea2
|
Provenance
The following attestation bundles were made for datalad_cds-0.0.14-py3-none-any.whl:
Publisher:
release.yml on matrss/datalad-cds
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalad_cds-0.0.14-py3-none-any.whl -
Subject digest:
496de51c559b1299d4ded34cd30fef778eb36185eee51991fd10e98179de900f - Sigstore transparency entry: 868367745
- Sigstore integration time:
-
Permalink:
matrss/datalad-cds@4b8b764c55f154d19f7481bf3f708dd87ddbce4f -
Branch / Tag:
refs/tags/v0.0.14 - Owner: https://github.com/matrss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4b8b764c55f154d19f7481bf3f708dd87ddbce4f -
Trigger Event:
push
-
Statement type: