Skip to main content

An intake plugin for building and loading earth system data sets such as CMIP, CESM Large Ensemble

Project description

https://img.shields.io/circleci/project/github/NCAR/intake-esm/master.svg?style=for-the-badge&logo=circleci https://img.shields.io/codecov/c/github/NCAR/intake-esm.svg?style=for-the-badge Documentation Status Python Package Index Conda Version

Intake-esm

Intake-esm provides an intake plugin for creating file-based Intake catalogs for climate data from project efforts such as the Coupled Model Intercomparison Project (CMIP) and the Community Earth System Model (CESM) Large Ensemble Project. These projects produce a huge of amount climate data persisted on tape, disk storage components across multiple (in the order of ~ 300,000) netCDF files. Finding, investigating, loading these files into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm addresses this issue in three steps:

  • Datasets Collection Curation in form of YAML files. These YAML files provide information about data locations, access pattern, directory structure, etc. intake-esm uses these YAML files in conjunction with file name templates to construct a local database. Each row in this database consists of a set of metadata such as experiment, modeling realm, frequency corresponding to data contained in one netCDF file.

    >>> import intake
    >>> col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5")
  • Search and Discovery: once the database is built, intake-esm can be used for searching and discovering of climate datasets by eliminating the need for the user to know specific locations (file path) of their data set of interest:

    >>> cat = col.search(variable=['hfls'], frequency='mon',
    ...          modeling_realm='atmos',
    ...          institute=['CCCma', 'CNRM-CERFACS'])
  • Access: when the user is satisfied with the results of their query, they can ask intake-esm to load the actual netCDF files into xarray datasets:

    >>> dsets = cat.to_xarray(decode_times=True, chunks={'time': 50})

Intake-esm supports data holdings from the following projects:

  • CMIP: Coupled Model Intercomparison Project (phase 5 and phase 6)

  • CESM: Community Earth System Model Large Ensemble (LENS), and Decadal Prediction Large Ensemble (DPLE)

  • MPI-GE: The Max Planck Institute for Meteorology (MPI-M) Grand Ensemble (MPI-GE)

  • GMET: The Gridded Meteorological Ensemble Tool data

  • ERA5: ECWMF ERA5 Reanalysis dataset stored on NCAR’s GLADE in /glade/collections/rda/data/ds630.0

  • NA-CORDEX: The North American CORDEX program dataset residing on NCAR’s GLADE in /glade/collections/cdg/data/cordex/data/

  • CESM-LENS-AWS: Community Earth System Model Large Ensemble (CESM LENS) data holdings publicly available on Amazon S3 (us-west-2 region)

See documentation for more information.

Installation

Intake-esm can be installed from PyPI with pip:

pip install intake-esm

It is also available from conda-forge for conda installations:

conda install -c conda-forge intake-esm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-esm-2019.8.23.tar.gz (280.0 kB view hashes)

Uploaded Source

Built Distribution

intake_esm-2019.8.23-py3-none-any.whl (48.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page