Skip to main content

Extension of the original ESGF data discovery, download, replication package, esgpull

Project description

esgpull-plus - an API and processing extension to the ESGF data management utility

Rye

This respository, esgpull-plus, modifies and extends the functionality of esgpull by adding an API allowing file download via a yaml configuration file. This aims to make the download process more streamlined and improve reproducibility.

In addition - and a work in progress - esgpull-plus uses xesmf and cdo to allow immediate regridding of downloaded CMIP files onto the desired projection. This is useful given that many CMIP models - especially those dealing with ocean variables - output data on unstructured grids.

Finally - also a work in progress - esgpull-plus allows file subsetting, both for specified levels and custom subsetting to extract variable conditions at the sea floor.

Installation and set-up

This repository is a fork of the original ESGF esgf-download with additional esgpullplus functionality. The setup is designed to:

  1. Track upstream changes from the original repository
  2. Maintain additional dependencies for esgpullplus features
  3. Provide easy installation and update procedures using conda

1. Initial Installation (Conda - Recommended)

In your virtual environment of choice, install the package using pip. N.B. a conda environment is required for advanced regridding functionality (via python-cdo).

pip install esgpull-plus

2. Installation of packages necessary for additional regridding functionality

cdo is a powerful geospatial data tool. It's Python interface, python-cdo, is best installed via conda:

conda -c conda-forge install python-cdo

3. Setting up base esgpull functionality

Run

esgpull self install

as described in the original documentation here.

File Structure

esgf-download/
├── esgpull/                    # Original esgpull code
│   └── esgpullplus/           # Your additional functionality
│   └── [original esgpull files and directories]
├── update-from-upstream.sh    # YAML-based update script

Dependencies

Base Dependencies

The base esgpull dependencies are managed through pyproject.toml and include:

  • Core Python packages (httpx, click, rich, etc.)
  • Database tools (sqlalchemy, alembic)
  • Configuration management (pydantic, tomlkit)

Additional Dependencies (esgpullplus)

As well as the original dependencies, the following are installed via the pyproject.toml file to process the downloaded .netcdf files:

  • General data handling (pandas, numpy)
  • Streamlining downloads (requests, watchdog, rich)
  • Geospatial manipulation (xesmf, cdo-python (through conda))

Keeping Up with Upstream (original esgpull package)

Automatic Update (Recommended)

# Update from upstream and reinstall dependencies
./update-from-upstream.sh

This script will:

  1. Fetch latest changes from upstream
  2. Merge them into your current branch
  3. Reinstall all dependencies
  4. Verify esgpullplus functionality

Manual Update

# Fetch upstream changes
git fetch upstream

# Merge into your branch
git merge upstream/main

# Reinstall dependencies (conda-aware)
if command -v conda &> /dev/null; then
    conda install -c conda-forge -y pandas xarray numpy requests
    pip install xesmf cdo-python watchdog orjson
else
    pip install -r requirements-plus.txt
fi

Git Configuration

Your repository should have these remotes configured:

# Check current remotes
git remote -v

# Should show:
# origin    https://github.com/orlando-code/esgpull-plus/ (fetch)
# origin    https://github.com/orlando-code/esgpull-plus/ (push)
# upstream  https://github.com/ESGF/esgf-download.git (fetch)
# upstream  https://github.com/ESGF/esgf-download.git (push)

If upstream is not configured:

git remote add upstream https://github.com/ESGF/esgf-download.git

Everything below this is copied directly from the original esgpull repository.

from esgpull import Esgpull, Query

query = Query()
query.selection.project = "CMIP6"
query.options.distrib = True  # default=False
esg = Esgpull()
nb_datasets = esg.context.hits(query, file=False)[0]
nb_files = esg.context.hits(query, file=True)[0]
datasets = esg.context.datasets(query, max_hits=5)
print(f"Number of CMIP6 datasets: {nb_datasets}")
print(f"Number of CMIP6 files: {nb_files}")
for dataset in datasets:
    print(dataset)

Features

  • Command-line interface
  • HTTP download (async multi-file)

Installation

esgpull is distributed via PyPI:

pip install esgpull
esgpull --help

For isolated installation, uv or pipx are recommended:

# with uv
uv tool install esgpull
esgpull --help

# alternatively, uvx enables running without explicit installation (comes with uv)
uvx esgpull --help
# with pipx
pipx install esgpull
esgpull --help

Usage

Usage: esgpull [OPTIONS] COMMAND [ARGS]...

  esgpull is a management utility for files and datasets from ESGF.

Options:
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  add       Add queries to the database
  config    View/modify config
  convert   Convert synda selection files to esgpull queries
  download  Asynchronously download files linked to queries
  login     OpenID authentication and certificates renewal
  remove    Remove queries from the database
  retry     Re-queue failed and cancelled downloads
  search    Search datasets and files on ESGF
  self      Manage esgpull installations / import synda database
  show      View query tree
  status    View file queue status
  track     Track queries
  untrack   Untrack queries
  update    Fetch files, link files <-> queries, send files to download...

Useful links

Contributions

You can use the common github workflow (through pull requests and issues) to contribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esgpull_plus-0.0.3.tar.gz (349.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

esgpull_plus-0.0.3-py3-none-any.whl (147.9 kB view details)

Uploaded Python 3

File details

Details for the file esgpull_plus-0.0.3.tar.gz.

File metadata

  • Download URL: esgpull_plus-0.0.3.tar.gz
  • Upload date:
  • Size: 349.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for esgpull_plus-0.0.3.tar.gz
Algorithm Hash digest
SHA256 81faa6ff9965a8cdd65351da78b8a8e0fa3723333c7d7f12eebe4ed93bb538ab
MD5 8005898e9886690ef17c3cc1e573e956
BLAKE2b-256 ab3864705baefcec6aafea62b9101d84c5a5329345dc34b17498ed14ec52fbeb

See more details on using hashes here.

File details

Details for the file esgpull_plus-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: esgpull_plus-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 147.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for esgpull_plus-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0afc0c203fac2b56e7ec1d8977ecb3d52bfb54b9cd8a5a62d34665ac4d2bb11c
MD5 0f2921e7570a80076a88721e8a3d30b2
BLAKE2b-256 45358f69427e90f65a8136f6190dc657999f315c313796a275a66deaa15e6f7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page