Skip to main content

DataLad extension for downloading from the Copernicus Climate Data Store

Project description

DataLad extension for the Copernicus Climate Data Store

What?

A DataLad extension to integrate with the Copernicus Climate Data Store (CDS). So far this just implements a datalad download-cds command that can be used to fetch data from the CDS and record this action in a way so that datalad get (or just git annex get) can redo the download in the future.

Why?

This extension enables automated provenance tracking for fetching data from the CDS. In a dataset that retrieves data from the CDS using this extension it will become visible how this data was initially fetched and how it can be retrieved again in the future.

How?

You will first have to create an account with the CDS, if you don't have one already. You can do so here: https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome

Next, you will need to create the "~/.cdsapirc" file as described here: https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key. This file is required since the datalad-cds extension internally uses the cdsapi package and therefore uses its authentication mechanism.

Also, you need to install datalad and the datalad-cds extension. Both can be had through pip.

Now you are ready to use the extension. When you look through the CDS you will notice that for any given dataset you can select a subset of the data using the "Download data" tab. After you do that you can use the "Show API request" button at the bottom to get a short python script that would fetch the chosen subset using the cdsapi. The following is an example of that:

#!/usr/bin/env python
import cdsapi
c = cdsapi.Client()
c.retrieve(
    "reanalysis-era5-pressure-levels",
    {
        "variable": "temperature",
        "pressure_level": "1000",
        "product_type": "reanalysis",
        "year": "2008",
        "month": "01",
        "day": "01",
        "time": "12:00",
        "format": "grib"
    },
    "download.grib",
)

To fetch the same data to the same local file using datalad-cds we just need to adapt this a little:

$ datalad download-cds --path download.grib '
    {
        "dataset": "reanalysis-era5-pressure-levels",
        "sub-selection": {
            "variable": "temperature",
            "pressure_level": "1000",
            "product_type": "reanalysis",
            "year": "2008",
            "month": "01",
            "day": "01",
            "time": "12:00",
            "format": "grib"
        }
    }
'

The local path to save to ("download.grib") becomes the --path argument. The dataset name ("reanalysis-era5-pressure-levels" in this case) becomes the value of the dataset key in a json object that describes the data to be downloaded. The sub-selection of the dataset becomes the value of the sub-selection key.

After executing the above datalad download-cds command in a DataLad dataset a file called "download.grib" should be newly created. This file will have its origin tracked in git-annex (you can see that by running git annex whereis download.grib). If you now datalad drop the file and then datalad get it you'll see that git-annex will automatically re-retrieve the file from the CDS as if it was just another location to get data from.

To see more possible usage options take a look at the help page of the command (datalad download-cds --help) or the documentation at https://matrss.github.io/datalad-cds/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad_cds-0.0.14.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalad_cds-0.0.14-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file datalad_cds-0.0.14.tar.gz.

File metadata

  • Download URL: datalad_cds-0.0.14.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datalad_cds-0.0.14.tar.gz
Algorithm Hash digest
SHA256 c8921edf1ef3b4177ca03c220187b0d56c1b4e926f77606e181a794773efb9b0
MD5 7a1de5a5edf6268dcdadea8b9978450d
BLAKE2b-256 139b8ab3448b4d6789382ad83cf4d0098922e68612b494f28bad9e78b98bc169

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalad_cds-0.0.14.tar.gz:

Publisher: release.yml on matrss/datalad-cds

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datalad_cds-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: datalad_cds-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datalad_cds-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 496de51c559b1299d4ded34cd30fef778eb36185eee51991fd10e98179de900f
MD5 d28e616941ed922ae52a0fe1a294a17b
BLAKE2b-256 c7df351c754313f5179a8bf230e819398042b02baa8f1a9effb5f5131ed75ea2

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalad_cds-0.0.14-py3-none-any.whl:

Publisher: release.yml on matrss/datalad-cds

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page