xarray extension for GDAL
Project description
gdalxarray
An xarray backend powered directly by GDAL.
import xarray as xr
ds = xr.open_dataset(path_or_uri, engine="gdalxarray")
gdalxarray is a thin bridge between GDAL's reading capabilities- classic
raster, multidimensional, and any of the virtualized stores GDAL knows about-
and xarray's labelled-array model. Lazy by default, optionally Dask-chunked,
with native CRS and CF time handling.
Installation
GDAL has no usable PyPI wheels, so pip install gdalxarray alone is not
enough. You need a working osgeo.gdal Python binding first. The
recommended paths, in order of friction:
- conda-forge (cross-platform, just works):
mamba install -c conda-forge gdal - Docker image with GDAL preinstalled (e.g.
ghcr.io/hypertidy/gdal-r-python:latest) - System package manager (apt, brew) plus matching system Python bindings
Then pip install gdalxarray for the engine itself. See
INSTALL.md for the full guide, including troubleshooting
for NumPy ABI mismatches and Python version pinning.
Why this exists (vs rioxarray)
rioxarray is an xarray accessor
and backend built on
rasterio, which wraps GDAL with its
own Python conventions. For straightforward 2D/3D raster work it's the
mature, widely-used choice- the da.rio.reproject(...) accessor pattern
is well-known and well-tested.
gdalxarray goes directly to osgeo.gdal, with no rasterio layer
in between. That choice matters in a few specific cases:
- GDAL's multidimensional API is exposed natively- N-D arrays with named dimensions, not just (y, x) rasters with optional bands
- Any GDAL virtualization composes-
/vsicurl/,/vsis3/,vrt://,ZARR:,NETCDF:, classic VRT, multidim VRT - Codec and driver support tracks GDAL rather than whatever rasterio re-exposes- Zarr v3, Icechunk, kerchunk-Parquet stores, GRIB, HDF4/5 multidim- all readable via the GDAL drivers
For a single GeoTIFF or a STAC item, rioxarray is usually a better fit.
For multidim cloud-native datasets, virtualized Zarr/Icechunk stores, or
anything where you want GDAL itself to be the source of truth,
gdalxarray puts you closer to the metal.
Three core usage modes
The package has three ways to open a dataset, and almost everything else is a composition of these with GDAL virtual paths.
1. Classic raster, bands as a dimension (default)
For multispectral imagery, image stacks, and anything where bands are
interchangeable axes. Produces a single band_data DataArray with
dims (band, y, x)- the rioxarray-compatible layout.
import xarray as xr
ds = xr.open_dataset("image.tif", engine="gdalxarray", multidim=False)
ds["band_data"]
# <xarray.DataArray 'band_data' (band: 3, y: 1024, x: 1024)>
# ...
# xarray idioms work as expected:
mean_image = ds["band_data"].mean(dim="band")
just_nir = ds["band_data"].sel(band=4)
2. Classic raster, bands as separate variables
For multiband rasters where each band carries a semantically distinct quantity (e.g. a NetCDF translated to multiband GeoTIFF where bands are different physical variables). Each band becomes a separate data variable named after its description.
ds = xr.open_dataset(
"multivariable.tif",
engine="gdalxarray",
multidim=False,
band_as_dim=False,
)
ds
# <xarray.Dataset>
# Data variables:
# temperature (y, x) float32
# salinity (y, x) float32
# density (y, x) float32
3. Multidim- N-D arrays with named dimensions
For datasets with their own dimension/coordinate structure: HDF5, NetCDF, multidim VRT, GRIB, Zarr (v2 and v3). Produces a Dataset whose dims and coords come from the source.
ds = xr.open_dataset("dataset.nc", engine="gdalxarray", multidim=True)
ds["temperature"].sel(time="2024-06", level=500).isel(latitude=slice(100, 200))
multidim=True is the default for engine="gdalxarray".
4. Warp recipes — lazy reprojection
For warping any GDAL-readable source into a target CRS, grid, or
projection, gdalxarray.warp returns a VRT recipe string rather than
materialising pixels:
import gdalxarray
import xarray as xr
vrt = gdalxarray.warp(source, crs="+proj=laea")
ds = xr.open_dataset(vrt, engine="gdalxarray", multidim=False)
The full warp configuration (target CRS, GCPs/RPCs/geolocation arrays, cutlines, resampling) is encoded in ~2 KB of VRT XML. Only the bytes your code actually reads flow over the network or off disk.
Composing with GDAL virtual paths
The three modes above combine with GDAL's virtualization layers to cover
nearly every cloud-native and remote-data scenario. None of these
require any code changes in gdalxarray- they're just different paths:
| Prefix or syntax | What it does |
|---|---|
/vsicurl/<url> |
HTTP/HTTPS-served files |
/vsis3/<bucket>/<key> |
S3 (anonymous via AWS_NO_SIGN_REQUEST=YES) |
/vsigs/... |
Google Cloud Storage |
vrt://<path>?<options> |
Inline classic-raster VRT- subdataset selection, resampling, ... |
NETCDF:<path>:<var> |
Pick a subdataset from a NetCDF |
ZARR:"<path>":/<array> |
Open one array of a Zarr store as a classic raster |
Classic VRT (.vrt) |
XML file referencing other sources |
Multidim VRT (.vrt) |
N-D version, layered over NetCDF/HDF/Zarr sources |
A few illustrative compositions:
# Public COG over HTTPS:
xr.open_dataset(
"/vsicurl/https://example.com/data.tif",
engine="gdalxarray", multidim=False,
)
# All variables of a CMEMS NetCDF on S3:
xr.open_dataset(
"NETCDF:/vsis3/bucket/path/file.nc",
engine="gdalxarray", multidim=True,
)
# A multidim VRT as a labelled coordinate-aware view over a raw NetCDF:
xr.open_dataset("study_area.vrt", engine="gdalxarray", multidim=True)
Which mode for which format?
As a rough guide, multidim=True is the natural fit for formats whose
own data model is N-dimensional with named axes:
- NetCDF (3 and 4)
- HDF5 / HDF4
- Multidim VRT
- GRIB / GRIB2
- Zarr (v2 and v3)
- Icechunk (where supported by your GDAL build)
multidim=False is the natural fit for image-like formats:
- GeoTIFF (including COG)
- JPEG, PNG, JPEG2000
- ERDAS Imagine (.img)
- Classic VRT files
- Anything GDAL identifies as a 2D-with-bands raster
Status
Active development. The API has settled but small changes are
possible before 1.0. See CHANGELOG.md for what's
landed and the issue tracker
for what's next.
For worked examples against real cloud-native data
(BRAN2023 ocean reanalysis, ECMWF AIFS forecasts, CMEMS sea level,
NOAA OISST), see docs/cookbook.md.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gdalxarray-0.4.0.tar.gz.
File metadata
- Download URL: gdalxarray-0.4.0.tar.gz
- Upload date:
- Size: 38.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71db63c06e90efc4b66ce229f48c71b279bc8abab3a349bb4052bfe4e4021eed
|
|
| MD5 |
65582e8cf25c6c5c413bfdb0f7644a1e
|
|
| BLAKE2b-256 |
c81f39b04d28ffcddd97f54b9067ad9f31cc058562bc1cd7ff5db71bd2611280
|
Provenance
The following attestation bundles were made for gdalxarray-0.4.0.tar.gz:
Publisher:
release.yml on hypertidy/gdalxarray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gdalxarray-0.4.0.tar.gz -
Subject digest:
71db63c06e90efc4b66ce229f48c71b279bc8abab3a349bb4052bfe4e4021eed - Sigstore transparency entry: 1833651957
- Sigstore integration time:
-
Permalink:
hypertidy/gdalxarray@8eff841aa8e42028accc09ff15a2cf33def1e9f4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/hypertidy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eff841aa8e42028accc09ff15a2cf33def1e9f4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gdalxarray-0.4.0-py3-none-any.whl.
File metadata
- Download URL: gdalxarray-0.4.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf668142e4ea9bf95906e441f2d97b27f3874728fc0c923361caf07225d35a93
|
|
| MD5 |
ca39c6a804bca438e6cc4df7414070d8
|
|
| BLAKE2b-256 |
b8aeaae18e802fa1707d3af3cd7814face44cdb9dc1eb1625105ce8ba9f842b1
|
Provenance
The following attestation bundles were made for gdalxarray-0.4.0-py3-none-any.whl:
Publisher:
release.yml on hypertidy/gdalxarray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gdalxarray-0.4.0-py3-none-any.whl -
Subject digest:
cf668142e4ea9bf95906e441f2d97b27f3874728fc0c923361caf07225d35a93 - Sigstore transparency entry: 1833652180
- Sigstore integration time:
-
Permalink:
hypertidy/gdalxarray@8eff841aa8e42028accc09ff15a2cf33def1e9f4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/hypertidy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eff841aa8e42028accc09ff15a2cf33def1e9f4 -
Trigger Event:
push
-
Statement type: