Skip to main content

A utility for downloading and subsetting ERA5 data from NCAR S3

Project description

era5-download

CLI tool for downloading ERA5 reanalysis data from NCAR's AWS S3 archive, clipping files to geographic bounds, and uploading the results to a remote destination. As of this writing, the total size of the ERA5 global files is 236 TB -- you will definitely want to clip to your region.

For each file, the tool: downloads from the NCAR source via rclone, clips to lat/lon bounds via ncks (converting to NetCDF-4 with compression), and uploads the clipped file to a configured remote destination.

Prerequisites

  • Python >= 3.10
  • ncks (part of NCO) must be installed separately for NetCDF clipping

Installation

pip install era5_s3_dl

With optional Sentry error tracking (used in Docker deployment):

pip install era5_s3_dl[sentry]

Configuration

Copy parameters_example.toml to parameters.toml and edit it. The TOML file is the primary configuration mechanism -- all settings live here, and select values can be overridden via CLI options.

TOML sections

Root-level settings:

Key Type Default Description
n_tasks int 8 Number of parallel download workers
check_target bool true Check remote for existing files before downloading (skips duplicates)
work_dir string (temp dir) Working directory for intermediate files. Creates download/ and clipped/ subdirs within it. If not set, uses a temporary directory that is cleaned up on completion.
preset string Variable preset (e.g. "wrf", "all"). Required unless variables is set.
variables string Comma-separated NetCDF variable names to download (case insensitive). Combined with preset if both set.

[dates] -- Date range for files to download:

[dates]
start_date = "2020-01-01"
end_date = "2024-12-31"

[bounds] -- Geographic clipping bounds (longitude/latitude):

[bounds]
min_lon = 145
max_lon = 195
min_lat = -60
max_lat = -20

[source] -- rclone config for the NCAR ERA5 source bucket. The default points to the public nsf-ncar-era5 bucket on AWS:

[source]
type = 's3'
provider = 'AWS'
env_auth = 'false'
region = 'us-west-2'
path = 'nsf-ncar-era5/'

[remote] -- rclone config for the upload destination. Can be S3 or a local path. Optional for --list-only mode:

# S3 example:
[remote]
type = 's3'
provider = 'Mega'
endpoint = 'https://s3.example.com'
access_key_id = ''
secret_access_key = ''
path = '/data/ncar/era5/'

# Local example (use with Docker volume mount):
[remote]
type = 'local'
path = '/data/output'

[sentry] (optional) -- Sentry error tracking. Requires the sentry extra to be installed:

[sentry]
dsn = ''
tags = {}

Variable Selection

You must specify which ERA5 variables to download using --preset and/or --variables.

Presets select a predefined group of variables:

  • wrf -- the 26 variables required for WRF model initialization
  • all -- all 92 available ERA5 variables

Variables are specified by their NetCDF variable name (case insensitive). The full catalog is in era5_dl/era5_variables.json. Examples: SP, VAR_2T, Z, CAPE, U, V.

When both --preset and --variables are given, the variables are combined (union).

# Download WRF variables only
era5_dl parameters.toml --preset wrf

# Download just surface pressure and 2m temperature
era5_dl parameters.toml --variables sp,var_2t

# Download WRF variables plus CAPE
era5_dl parameters.toml --preset wrf --variables cape

# List all available files without downloading
era5_dl parameters.toml --preset all --list-only

Usage

Basic invocation with a parameters file:

era5_dl parameters.toml --preset wrf

Override specific TOML values via CLI options:

era5_dl parameters.toml --preset wrf -s 2024-01-01 -e 2024-12-31 -n 4
era5_dl parameters.toml --preset wrf --min-lon 160 --max-lon 180 --no-check-target

List matching files without downloading (does not require [remote] config):

era5_dl parameters.toml --preset wrf --list-only

See all options:

era5_dl --help

CLI Options

All options are optional and override the corresponding value in the parameters TOML file.

Option Short Type Overrides
--n-tasks -n int n_tasks
--work-dir -w path work_dir
--check-target / --no-check-target bool check_target
--start-date -s text dates.start_date
--end-date -e text dates.end_date
--min-lon float bounds.min_lon
--max-lon float bounds.max_lon
--min-lat float bounds.min_lat
--max-lat float bounds.max_lat
--preset -p text preset
--variables -v text variables
--list-only -l flag Query and list files only, skip download

Docker

The Docker image includes ncks and installs the tool with the Sentry extra. Modify parameters.toml and docker-compose.yml, then run:

docker-compose up -d
docker-compose logs -f

The docker-compose.yml mounts your parameters.toml into the container and maps an output volume:

volumes:
    - "./parameters.toml:/parameters.toml"
    - "./output:/data/output"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

era5_s3_dl-0.1.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

era5_s3_dl-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file era5_s3_dl-0.1.0.tar.gz.

File metadata

  • Download URL: era5_s3_dl-0.1.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for era5_s3_dl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b3c9753521b5320015993ca7befae7d325d0bf7e44a6d9b2deac351c171373a
MD5 6919ef6da1bad7ff27e35ba4a62edffa
BLAKE2b-256 0fb9973fd98fab4d47c7fc221d8db72a980b98bff3aac236414d645cf5b918de

See more details on using hashes here.

File details

Details for the file era5_s3_dl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: era5_s3_dl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for era5_s3_dl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 592e85530a9e3851fe231f4f5dd2407b315d738199d0d21f2e4c4aa2eb579680
MD5 fd72acbb39c1117be7c8413e605ff376
BLAKE2b-256 d2393da347c39a49d3ee465c2dd164d2f497bee960239d8bc47a11920e461696

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page