Skip to main content

Extend xarray.open_dataset to accept pystac objects

Project description

xpystac

xpystac provides the glue that allows xarray.open_dataset to accept pystac objects.

The goal is that as long as this library is in your env, you should never need to think about it.

  • Open one asset: Reads data for an asset pointing to a COG, a zarr store, or a kerchunk reference file.
  • Open one item: Reads data for all the assets in a particular item (commonly each COG represents a band).
  • Open many items: Reads all the assets in all the items for a particular item collection iterable of items, or output of pystac_client.Client.search.

What works

file format one asset (item or collection-level) one item many items
COG x x x
Zarr x
Kerchunk x x* x*
virtual Icechunk x

* if stored in item alongside the datacube extension properties

Install

pip install xpystac

Examples

Open a single asset

Read from a COG

import pystac
import xarray as xr

item = pystac.Item.from_file(
    "https://raw.githubusercontent.com/stac-utils/pystac/v1.12.2/tests/data-files/examples/1.0.0/simple-item.json"
)
asset = item.assets["visual"]

xr.open_dataset(asset)

Read from a virtual Icechunk store

import pystac
import xarray as xr

collection = pystac.Collection.from_file(
    "https://raw.githubusercontent.com/stac-utils/xpystac/refs/heads/main/tests/data/virtual-icechunk-collection.json"
)

# Get the latest version of the collection-level asset
assets = collection.get_assets(role="latest-version")
asset = next(iter(assets.values()))

xr.open_dataset(asset)

Here are a few examples from the Planetary Computer Docs which has some good examples of collection-level assets used to catalog zarr stores and kerchunk reference files.

import planetary_computer
import pystac_client
import xarray as xr


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

Read from a kerchunk reference file (ref):

collection = catalog.get_collection("nasa-nex-gddp-cmip6")
asset = collection.assets["ACCESS-CM2.historical"]

xr.open_dataset(asset, patch_url=planetary_computer.sign)

Read from a zarr file (ref)

collection = catalog.get_collection("daymet-daily-hi")
asset = collection.assets["zarr-abfs"]

xr.open_dataset(asset, patch_url=planetary_computer.sign)

Note that this zarr asset uses the xarray-assets extension to store open_kwargs and storage_options which xpystac can then pass along to xr.open_dataset.

Open a single item

A single item containing many COGs:

import pystac
import xarray as xr


item = pystac.Item.from_file(
    "https://earth-search.aws.element84.com/v1/collections/landsat-c2-l2/items/LC09_L2SR_081108_20250311_02_T2"
)

xr.open_dataset(item)

This takes advantage of a stacking library (either odc-stac or stackstac - configurable via the stacking_library option)

Open many items

Read all the data from the search results for a collection of COGs:

import pystac_client
import xarray as xr


catalog = pystac_client.Client.open(
    "https://earth-search.aws.element84.com/v1",
)

search = catalog.search(
    intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
    collections=['sentinel-2-l2a'],
    datetime="2022-04-01/2022-05-01",
)

xr.open_dataset(search, engine="stac")

Read data from an item collection that uses the exploratory approach of storing kerchunked metadata within the datacube extension metadata:

import pystac
import xarray as xr

item_collection = pystac.ItemCollection.from_file(
    "https://raw.githubusercontent.com/stac-utils/xpystac/main/tests/data/data-cube-kerchunk-item-collection.json"
)

xr.open_dataset(item_collection)

How it works

When you call xarray.open_dataset(object, engine="stac") this library maps that open call to the correct library. Depending on the type of object that might be a stacking library (either odc-stac or stackstac) or back to xarray.open_dataset itself but with the engine and other options pulled from the pystac object.

Prior Art

This work is inspired by https://github.com/TomAugspurger/staccontainers and the discussion in https://github.com/stac-utils/pystac/issues/846

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpystac-0.4.0.tar.gz (155.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xpystac-0.4.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file xpystac-0.4.0.tar.gz.

File metadata

  • Download URL: xpystac-0.4.0.tar.gz
  • Upload date:
  • Size: 155.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpystac-0.4.0.tar.gz
Algorithm Hash digest
SHA256 9b26273198f550bc09b3f16512b560d37916a32417bb9c69919ec1e578efcfc1
MD5 0520ebea3cdf57844682430d103d1fa0
BLAKE2b-256 78537bc08e103de9a6a3016eca4e62aa2c4f1783a3ccced94943cd49aa28d7b2

See more details on using hashes here.

File details

Details for the file xpystac-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: xpystac-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpystac-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88714d138bdba930f1be7c1ac9e833b9f8cba525a141e9269c2c564a369dc058
MD5 c649d7ecf92e1644f55fef1f252ee120
BLAKE2b-256 4df9dfa302752810a0c76bc58df4ed11ec0914918e46ebf26e3989639d1fdb16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page