create indices for GRIB files and provide an xarray interface

Project description

gribscan

Tools to scan gribfiles and create zarr-compatible indices.

warning

This repository is still experimental. The code is not yet tested for many kinds of files. It will likely not destroy your files, as it only accesses gribfiles in read-mode, but it may skip some information or may crash. Please file an issue if you discover something is missing.

howto

scanning gribs

gribscan.py scans a single gribfile for contained GRIB messages. Metadata about the contained messages as well as byte-locations of the contained GRIB messages are written in a fsspec ReferenceFileSystem compatible JSON file, which internally builds a zarr-group structure.

python gribscan.py a.grib a.json

Alternatively, gribscan can also be called from Python to achieve the same result:

import gribscan
gribscan.grib2kerchunk("a.grib", "a.json")

Note: While gribscan uses cfgrib partially to read GRIB metadata, it does so in a rather hacky way. That way, gribscan does not have to create temporary files and is much faster than cfgrib or kerchunk.grib2, but it may not be as universal as cfgrib is. This is also the main reason for the warning above.

reading indexed grib via zarr

The resulting JSON-file can be interpreted by ReferenceFileSystem and zarr as follows:

import rawgribcodec
import xarray as xr
ds = xr.open_zarr("reference::a.json", consolidated=False)
ds

Note that rawgribcodec must be imported in order to register rawgrib as a numcodecs codec, which enables the use of GRIB messages as zarr-chunks. As opposed to gribscan, rawgribcodec only depends on eccodes and doesn't use cfgrib at all.

fsspec supports URL chaining. The prefix reference:: before the path signals to fsspec, that after loading the given path, an ReferenceFileSystem should be initialized with whatever is found in that path. In principle, it's well possible to use ReferenceFileSystem also across HTTP or wihin ZIP files or a combination thereof...

combining multiple gribs into a larger dataset

As the generated JSON files are already in ReferenceFileSystem / kerchunk compatible format, we can just use the tools provided by kerchunk to aggretate multiple index files into one larger file:

import rawgribcodec
from kerchunk.combine import MultiZarrToZarr

mzz = MultiZarrToZarr(
    "some_folder/*.json",  # <- pattern which can be used to glob for all the index-JSON-files
    remote_protocol="file",
    xarray_open_kwargs={
        #"preprocess": drop_coords,
        "decode_cf": False,
        "mask_and_scale": False,
        "decode_times": False,
        "decode_timedelta": False,
        "use_cftime": False,
        "decode_coords": False
    },
    xarray_concat_args={
        "dim": "time",
    }
)

mzz.translate("mzz.json")  # <- write output

The generated multi-zarr-file can be used just as the individual files.

notebooks

There are a few notebooks which experiment with these tools:

gribscan_test.ipynb looks at the output of a single gribscan run
build_index.ipynb generates many JSON index files
build_multizarr.ipynb combines the generated JSON files into one and looks at the result

Project details

Release history Release notifications | RSS feed

0.0.10

Feb 6, 2024

0.0.9

Feb 6, 2024

0.0.8

Jan 29, 2024

0.0.7

May 31, 2023

0.0.6

Apr 12, 2023

0.0.5

Sep 2, 2022

0.0.4

May 11, 2022

0.0.3

May 10, 2022

0.0.2

Mar 18, 2022

This version

0.0.1

Mar 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gribscan-0.0.1.tar.gz (9.8 kB view hashes)

Uploaded Mar 9, 2022 Source

Built Distribution

gribscan-0.0.1-py3-none-any.whl (9.1 kB view hashes)

Uploaded Mar 9, 2022 Python 3

Hashes for gribscan-0.0.1.tar.gz

Hashes for gribscan-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ccde4a2b869e5acb7f30e684bf230701f1045b411d05e4ee9a3d32d21f404c1a`
MD5	`26fe945486d8bb8d1fd2fde28e784bbd`
BLAKE2b-256	`1f607d8fa111ea09407e4845a4bc90d74a3cefdd5f9660773c18241c6332125c`

Hashes for gribscan-0.0.1-py3-none-any.whl

Hashes for gribscan-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40c381b310a8f25d2a6a6be6180baac2f2d0a887fe50e2e92d343b68d0770d19`
MD5	`d02b287492b2fea15a238d735db826de`
BLAKE2b-256	`38cadd1bc94d050d74f9209e8b44234e23f4f2ac8c5d2f3622a7ba004a3bce39`