Extension of the original ESGF data discovery, download, replication package, esgpull
Project description
esgpull-plus - an API and processing extension to the ESGF data management utility
This respository, esgpull-plus, modifies and extends the functionality of esgpull by adding an API allowing file download via a yaml configuration file. This aims to make the download process more streamlined and improve reproducibility.
In addition - and a work in progress - esgpull-plus uses xesmf and cdo to allow immediate regridding of downloaded CMIP files onto the desired projection. This is useful given that many CMIP models - especially those dealing with ocean variables - output data on unstructured grids.
Finally - also a work in progress - esgpull-plus allows file subsetting, both for specified levels and custom subsetting to extract variable conditions at the sea floor.
Installation and set-up
This repository is a fork of the original ESGF esgf-download with additional esgpullplus functionality. The setup is designed to:
- Track upstream changes from the original repository
- Maintain additional dependencies for esgpullplus features
- Provide easy installation and update procedures using conda
1. Initial Installation (Conda - Recommended)
In your virtual environment of choice, install the package using pip. N.B. a conda environment is required for advanced regridding functionality (via python-cdo).
pip install esgpull-plus
2. Installation of packages necessary for additional regridding functionality
cdo is a powerful geospatial data tool. It's Python interface, python-cdo, is best installed via conda:
conda -c conda-forge install python-cdo
3. Setting up base esgpull functionality
Run
esgpull self install
as described in the original documentation here.
File Structure
esgf-download/
├── esgpull/ # Original esgpull code
│ └── esgpullplus/ # Your additional functionality
│ └── [original esgpull files and directories]
├── update-from-upstream.sh # YAML-based update script
Dependencies
Base Dependencies
The base esgpull dependencies are managed through pyproject.toml and include:
- Core Python packages (httpx, click, rich, etc.)
- Database tools (sqlalchemy, alembic)
- Configuration management (pydantic, tomlkit)
Additional Dependencies (esgpullplus)
As well as the original dependencies, the following are installed via the pyproject.toml file to process the downloaded .netcdf files:
- General data handling (pandas, numpy)
- Streamlining downloads (requests, watchdog, rich)
- Geospatial manipulation (xesmf, cdo-python (through
conda))
Keeping Up with Upstream (original esgpull package)
Automatic Update (Recommended)
# Update from upstream and reinstall dependencies
./update-from-upstream.sh
This script will:
- Fetch latest changes from upstream
- Merge them into your current branch
- Reinstall all dependencies
- Verify esgpullplus functionality
Manual Update
# Fetch upstream changes
git fetch upstream
# Merge into your branch
git merge upstream/main
# Reinstall dependencies (conda-aware)
if command -v conda &> /dev/null; then
conda install -c conda-forge -y pandas xarray numpy requests
pip install xesmf cdo-python watchdog orjson
else
pip install -r requirements-plus.txt
fi
Git Configuration
Your repository should have these remotes configured:
# Check current remotes
git remote -v
# Should show:
# origin https://github.com/orlando-code/esgpull-plus/ (fetch)
# origin https://github.com/orlando-code/esgpull-plus/ (push)
# upstream https://github.com/ESGF/esgf-download.git (fetch)
# upstream https://github.com/ESGF/esgf-download.git (push)
If upstream is not configured:
git remote add upstream https://github.com/ESGF/esgf-download.git
Everything below this is copied directly from the original esgpull repository.
from esgpull import Esgpull, Query
query = Query()
query.selection.project = "CMIP6"
query.options.distrib = True # default=False
esg = Esgpull()
nb_datasets = esg.context.hits(query, file=False)[0]
nb_files = esg.context.hits(query, file=True)[0]
datasets = esg.context.datasets(query, max_hits=5)
print(f"Number of CMIP6 datasets: {nb_datasets}")
print(f"Number of CMIP6 files: {nb_files}")
for dataset in datasets:
print(dataset)
Features
- Command-line interface
- HTTP download (async multi-file)
Installation
esgpull is distributed via PyPI:
pip install esgpull
esgpull --help
For isolated installation, uv or
pipx are recommended:
# with uv
uv tool install esgpull
esgpull --help
# alternatively, uvx enables running without explicit installation (comes with uv)
uvx esgpull --help
# with pipx
pipx install esgpull
esgpull --help
Usage
Usage: esgpull [OPTIONS] COMMAND [ARGS]...
esgpull is a management utility for files and datasets from ESGF.
Options:
-V, --version Show the version and exit.
-h, --help Show this message and exit.
Commands:
add Add queries to the database
config View/modify config
convert Convert synda selection files to esgpull queries
download Asynchronously download files linked to queries
login OpenID authentication and certificates renewal
remove Remove queries from the database
retry Re-queue failed and cancelled downloads
search Search datasets and files on ESGF
self Manage esgpull installations / import synda database
show View query tree
status View file queue status
track Track queries
untrack Untrack queries
update Fetch files, link files <-> queries, send files to download...
Useful links
Contributions
You can use the common github workflow (through pull requests and issues) to contribute.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file esgpull_plus-0.0.3.tar.gz.
File metadata
- Download URL: esgpull_plus-0.0.3.tar.gz
- Upload date:
- Size: 349.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81faa6ff9965a8cdd65351da78b8a8e0fa3723333c7d7f12eebe4ed93bb538ab
|
|
| MD5 |
8005898e9886690ef17c3cc1e573e956
|
|
| BLAKE2b-256 |
ab3864705baefcec6aafea62b9101d84c5a5329345dc34b17498ed14ec52fbeb
|
File details
Details for the file esgpull_plus-0.0.3-py3-none-any.whl.
File metadata
- Download URL: esgpull_plus-0.0.3-py3-none-any.whl
- Upload date:
- Size: 147.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0afc0c203fac2b56e7ec1d8977ecb3d52bfb54b9cd8a5a62d34665ac4d2bb11c
|
|
| MD5 |
0f2921e7570a80076a88721e8a3d30b2
|
|
| BLAKE2b-256 |
45358f69427e90f65a8136f6190dc657999f315c313796a275a66deaa15e6f7d
|