create indices for GRIB files and provide an xarray interface
Project description
gribscan
Tools to scan GRIB files and create zarr-compatible indices.
warning
This repository is still experimental. The code is not yet tested for many kinds of files. It will likely not destroy your files, as it only accesses GRIB files in read-mode, but it may skip some information or may crash. Please file an issue if you discover something is missing.
installing
gribscan
is on PyPI, you can install the recent released version using
python -m pip install gribscan
if you are interested in the recent development version, please clone the repository and install the package in development mode:
python -m pip install -e <path to your clone>
command line usage
gribscan
comes with two executables:
gribscan-index
for building indices of GRIB filesgribscan-build
for building a dataset from indices
building indices
gribscan
will create jsonlines-based .index
-files next to the input GRIB files. The format is based on the ECMWF OpenData index format but contains a lot more entries.
You can pass in multiple GRIB files at once and specify the number of parallel processes (-n
).
gribscan-index *.grb2 -n 16
Note: While gribscan
uses cfgrib
partially to read GRIB metadata, it does so in a rather hacky way. That way, gribscan
does not have to create temporary files and is much faster than cfgrib
or kerchunk.grib2, but it may not be as universal as cfgrib
is. This is also the main reason for the warning above.
building a dataset
After all the index files have been created, a common dataset can be assembled based on the information in the index files. The assembled dataset will be written outin a fsspec ReferenceFileSystem compatible JSON file, which internally builds a zarr-group structure.
gribscan-build *.index -o dataset.json --prefix <path prefix to referenced grib files>
The prefix
will be prepended to the paths within the dataset.json
and should point to the location of the original GRIB files.
reading indexed grib via zarr
The resulting JSON-file can be interpreted by ReferenceFileSystem
and zarr
as follows:
import gribscan
import xarray as xr
ds = xr.open_zarr("reference::dataset.json", consolidated=False)
ds
Note that gribscan
must be imported in order to register gribscan.rawgrib
as a numcodecs
codec, which enables the use of GRIB messages as zarr-chunks. As opposed to gribscan-index
, the codec only depends on eccodes
and doesn't use cfgrib
at all.
fsspec
supports URL chaining. The prefix reference::
before the path signals to fsspec
, that after loading the given path, an ReferenceFileSystem
should be initialized with whatever is found in that path. In principle, it's well possible to use ReferenceFileSystem
also across HTTP or wihin ZIP files or a combination thereof...
library usage
You might be interested in using gribscan
as a Python-library, which enables further usecases.
building indices
You can build an index from a single GRIB file (as explained above) using:
import gribscan
gribscan.write_index(gribfile, indexfile)
building dataset references
You can also assemble a dataset from the incides using:
import gribscan
magician = gribscan.Magician()
gribscan.grib_magic(indexfiles, magician, global_prefix)
The magician
is a class which can customize how the dataset is assembled. You may want to define your own in order to design the resulting dataset according to your preferences. Please have a look at magician.py
to see how a Magician would look like.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.