Skip to main content

create indices for GRIB files and provide an xarray interface

Project description

gribscan

Tools to scan GRIB files and create zarr-compatible indices.

warning

This repository is still experimental. The code is not yet tested for many kinds of files. It will likely not destroy your files, as it only accesses GRIB files in read-mode, but it may skip some information or may crash. Please file an issue if you discover something is missing.

installing

gribscan is on PyPI, you can install the recent released version using

python -m pip install gribscan

if you are interested in the recent development version, please clone the repository and install the package in development mode:

python -m pip install -e <path to your clone>

command line usage

gribscan comes with two executables:

  • gribscan-index for building indices of GRIB files
  • gribscan-build for building a dataset from indices

building indices

gribscan will create jsonlines-based .index-files next to the input GRIB files. The format is based on the ECMWF OpenData index format but contains a lot more entries.

You can pass in multiple GRIB files at once and specify the number of parallel processes (-n).

gribscan-index *.grb2 -n 16

Note: While gribscan uses cfgrib partially to read GRIB metadata, it does so in a rather hacky way. That way, gribscan does not have to create temporary files and is much faster than cfgrib or kerchunk.grib2, but it may not be as universal as cfgrib is. This is also the main reason for the warning above.

building a dataset

After all the index files have been created, a common dataset can be assembled based on the information in the index files. The assembled dataset will be written outin a fsspec ReferenceFileSystem compatible JSON file, which internally builds a zarr-group structure.

gribscan-build *.index -o dataset.json --prefix <path prefix to referenced grib files>

The prefix will be prepended to the paths within the dataset.json and should point to the location of the original GRIB files.

reading indexed grib via zarr

The resulting JSON-file can be interpreted by ReferenceFileSystem and zarr as follows:

import gribscan
import xarray as xr
ds = xr.open_zarr("reference::dataset.json", consolidated=False)
ds

Note that gribscan must be imported in order to register gribscan.rawgrib as a numcodecs codec, which enables the use of GRIB messages as zarr-chunks. As opposed to gribscan-index, the codec only depends on eccodes and doesn't use cfgrib at all.

fsspec supports URL chaining. The prefix reference:: before the path signals to fsspec, that after loading the given path, an ReferenceFileSystem should be initialized with whatever is found in that path. In principle, it's well possible to use ReferenceFileSystem also across HTTP or wihin ZIP files or a combination thereof...

library usage

You might be interested in using gribscan as a Python-library, which enables further usecases.

building indices

You can build an index from a single GRIB file (as explained above) using:

import gribscan
gribscan.write_index(gribfile, indexfile)

building dataset references

You can also assemble a dataset from the incides using:

import gribscan
magician = gribscan.Magician()
gribscan.grib_magic(indexfiles, magician, global_prefix)

The magician is a class which can customize how the dataset is assembled. You may want to define your own in order to design the resulting dataset according to your preferences. Please have a look at magician.py to see how a Magician would look like.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

gribscan-0.0.3-py3-none-any.whl (11.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page