An intake plugin for parsing an ESM (Earth System Model) Collection/catalog and loading assets (netCDF files and/or Zarr stores) into xarray datasets.
Project description
Intake-esm
Motivation
Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, Zarr, etc…). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it.
Finding, investigating, loading these assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm aims to address these issues by providing necessary functionality for searching, discovering, data access/loading.
Overview
intake-esm is a data cataloging utility built on top of intake, pandas, and xarray, and it’s pretty awesome!
Opening an ESM collection definition file: An ESM (Earth System Model) collection file is a JSON file that conforms to the ESM Collection Specification. When provided a link/path to an esm collection file, intake-esm establishes a link to a database (CSV file) that contains data assets locations and associated metadata (i.e., which experiement, model, the come from). The collection JSON file can be stored on a local filesystem or can be hosted on a remote server.
>>> import intake >>> col_url = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json" >>> col = intake.open_esm_datastore(col_url)
Search and Discovery: intake-esm provides functionality to execute queries against the database:
>>> cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', ... variable_id='o2', grid_label='gn')
Access: when the user is satisfied with the results of their query, they can ask intake-esm to load data assets (netCDF/HDF files and/or Zarr stores) into xarray datasets:
>>> dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times': False}, ... cdf_kwargs={'chunks': {}, 'decode_times': False})
See documentation for more information.
Installation
Intake-esm can be installed from PyPI with pip:
pip install intake-esm
It is also available from conda-forge for conda installations:
conda install -c conda-forge intake-esm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for intake_esm-2020.8.15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c83d7fbfcd864c08d43f32664cafb1c2255fc3ec50f65dc1bfce97a3787bd7a |
|
MD5 | 1784b495b5f42edafec741b013ee9bfa |
|
BLAKE2b-256 | a34ee257a0126bd4734ade1c2735279285fa2da07617a0f1023a8b455495363f |