Skip to main content

offline zarr downloader

Project description

Zarr offline downloader

The idea is to have an option to download a zarr dataset for offline reuse or conversion to netcdf. The package is a command line tool and is not covering all kind of different zarr stores. It has been tested with Google Cloud Storage of ECMWF ERA5 reanalysis.

usage: zarr-downloader.py [-h] [--variable VARIABLE] [--offset OFFSET] [--verbose] 
                          [--to-netcdf] [--overwrite] url output

Download a zarr Array / Group from a URL

positional arguments:
  url                  URL to download from
  output               Output directory to save the zarr file

options:
  -h, --help           show this help message and exit
  --variable VARIABLE  Subset of variables to download. e.g. t2m,lon,lat,...
  --offset OFFSET      Offset for the chunk access pattern
  --verbose, -v        Verbose output
  --to-netcdf          Convert to netcdf
  --overwrite, -o      Overwrite existing files

Examples

How to download a zarr dataset for offline reuse?

# Download ERA 5 surface 2m temperature from Google zarr store.
# There is an offset (chunks: 1000000.0)
# note t2m has two chunk dimensions: time,values
# this will download all chunks
zarr-downloader.py --offset 1000000,0 --variable t2m \
https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr-v2 \
/tmp/t2m
# this will download one chunk
zarr-downloader.py --offset 1000000,0 --size 1,1 --variable t2m \
https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr-v2 \
/tmp/t2m
# coordinates need to be downloaded completely
zarr-downloader.py --variable time,longitude,latitude \
https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr-v2 \
/tmp/t2m

after downloading one can open the full zarr dataset again, but access the downloaded parts:

import zarr
import xarray as xr

ds = xr.open_zarr('/tmp/t2m')
ds
<xarray.Dataset>
Dimensions:              (time: 1323648, values: 542080)
Coordinates:
    depthBelowLandLayer  float64 ...
    entireAtmosphere     float64 ...
    latitude             (values) float64 ...
    longitude            (values) float64 ...
    number               int64 ...
    step                 timedelta64[ns] ...
    surface              float64 ...
  * time                 (time) datetime64[ns] 1900-01-01 ... 2050-12-31T23:0...
    valid_time           (time) datetime64[ns] ...
Dimensions without coordinates: values
Data variables: (12/38)
    cape                 (time, values) float32 ...
    d2m                  (time, values) float32 ...
    hcc                  (time, values) float32 ...
    istl1                (time, values) float32 ...
    istl2                (time, values) float32 ...
    istl3                (time, values) float32 ...
    ...                   ...
    tsn                  (time, values) float32 ...
    u10                  (time, values) float32 ...
    u100                 (time, values) float32 ...
    v10                  (time, values) float32 ...
    v100                 (time, values) float32 ...
    z                    (time, values) float32 ...
Attributes: (12/14)
    Conventions:               CF-1.7
    GRIB_centre:               ecmf
    GRIB_centreDescription:    European Centre for Medium-Range Weather Forec...
    GRIB_edition:              1
    GRIB_subCentre:            0
    history:                   2023-08-27T15:04 GRIB to CDM+CF via cfgrib-0.9...
    ...                        ...
    pangeo-forge:recipe_hash:  09c22f5fffc2fbe3742fbb38c8c2c761e1b41b50570f09...
    pangeo-forge:version:      0.9.5.dev2+gd43015b
    valid_time_start:          1940-01-01
    last_updated:              2025-05-12 03:32:06.266520+00:00
    valid_time_stop:           2025-02-28
    valid_time_stop_era5t:     2025-04-30

Contribution

Follow the general rules of user contribution and politeness. Please open a ticket under issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarrofflinedownloader-2025.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zarrofflinedownloader-2025.5-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file zarrofflinedownloader-2025.5.tar.gz.

File metadata

  • Download URL: zarrofflinedownloader-2025.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for zarrofflinedownloader-2025.5.tar.gz
Algorithm Hash digest
SHA256 8575d1f264312da2fa96b66064d05a4e70d4424316abf885e9d0788c7fdf0467
MD5 752607a6dcde57abd3cc25e78692edaf
BLAKE2b-256 14545e760c703f1973d5bca2592a522abc73364dd930b03db54f5020fdb4d24d

See more details on using hashes here.

File details

Details for the file zarrofflinedownloader-2025.5-py3-none-any.whl.

File metadata

File hashes

Hashes for zarrofflinedownloader-2025.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9683af69c3d67c95591d184a9924bc3e533e5796d596a0eb70e29c0574414e66
MD5 a24c7948ace1460385a7a6d27ef75024
BLAKE2b-256 f33c6a6e77088844f4da2c13d290880dddfe794996f2daa6c0707b76b963400d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page