Skip to main content

A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.

Project description

Xarray Prism Engine

A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.

[!Important] If you encounter with a data formats that freva engine is not able to open, please files an issue report here. This helps us to improve the engine enabling users work with different kinds of climate data.

Installation

Install via PyPI

pip install xarray-prism

Install via Conda

conda install xarray-prism

Quick Start

Using with xarray

import xarray as xr

# Auto-detect format
ds = xr.open_dataset("my_data.unknown_fmt", engine="prism")

# Remote Zarr on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.zarr",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF3 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF4 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc4",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote Zarr on S3 - non-anon
ds = xr.open_dataset(
    "s3://bucket/data.zarr",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)

# OPeNDAP from THREDDS
ds = xr.open_dataset(
    "https://icdc.cen.uni-hamburg.de/thredds/dodsC/ftpthredds/ar5_sea_level_rise/gia_mean.nc",
    engine="prism"
)

# Local GRIB file
ds = xr.open_dataset("forecast.grib2", engine="prism")

# GeoTIFF
ds = xr.open_dataset("satellite.tif", engine="prism")

# tip: Handle the cache manually by yourself
xr.open_dataset(
    "simplecache::s3://bucket/file.nc3",
    engine="prism",
    storage_options={
        "s3": {"anon": True, "client_kwargs": {"endpoint_url": "..."}},
        "simplecache": {"cache_storage": "/path/to/cache"}
    }
)

# Even for the tif format on the S3 you can pass the credential through
# storage_options which is not supported by rasterio:
xr.open_dataset(
    "s3://bucket/file.tif",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)

Supported Formats

Data format Remote backend Local FS Cache
GRIB cfgrib + fsspec cfgrib fsspec simplecache (full-file)
Zarr zarr + fsspec zarr chunked key/value store
NetCDF3 scipy + fsspec scipy fsspec byte cache (5 MB blocks but full dowload)
NetCDF4/HDF5 h5netcdf + fsspec h5netcdf fsspec byte cache (5 MB block)
GeoTIFF rasterio + fsspec rasterio GDAL/rasterio block cache (5 MB block)
OPeNDAP/DODS netCDF4 n/a n/a

[!WARNING] Remote GRIB & NetCDF3 require full file download

Unlike Zarr or HDF5, these formats don't support partial/chunk reads over the network.

By default, xarray-prism caches files in the system temp directory. This works well for most cases. If temp storage is a concern (e.g., limited space or cleared on reboot), you can specify a persistent cache:

Option How
Environment variable export XARRAY_PRISM_CACHE=/path/to/cache
Per-call storage_options={"simplecache": {"cache_storage": "/path"}}
Default System temp directory

Customization

Custom Format Detectors and URI Types

You can extend xarray-prism with custom format detectors, URI types, and open handlers by providing a small plugin package. Registration happens at import time, so importing the plugin activates it.

Plugin structure

xarray_prism_myplugin/
  __init__.py   # imports the plugin module (triggers registration)
  plugin.py     # detectors, URI types, and open handlers
pyproject.toml

Plugin implementation

xarray_prism_myplugin/__init__.py

from .plugin import *  # noqa: F401,F403

xarray_prism_myplugin/plugin.py

import xarray as xr
from xarray_prism import register_detector, register_uri_type, registry


@register_uri_type(priority=100)
def detect_myfs_uri(uri: str):
    """Detect a custom filesystem URI."""
    if uri.lower().startswith("myfs://"):
        return "myfs"
    return None


@register_detector(priority=100)
def detect_foo_format(uri: str):
    """Detect a custom file format."""
    if uri.lower().endswith(".foo"):
        return "foo"
    return None


@registry.register("foo", uri_type="myfs")
def open_foo_from_myfs(uri: str, **kwargs):
    """Open .foo files from myfs:// URIs."""
    translated = uri.replace("myfs://", "https://my-gateway.example/")
    return xr.open_dataset(translated, engine="h5netcdf", **kwargs)

Plugin installation

pyproject.toml

[project]
name = "xarray-prism-myplugin"
version = "0.1.0"
dependencies = ["xarray-prism"]

[project.entry-points."xarray_prism.plugins"]
myplugin = "xarray_prism_myplugin"

Using the plugin

After installing the plugin package, import it once to activate the registrations:

import xarray_prism_myplugin  # activates detectors and handlers

import xarray as xr
ds = xr.open_dataset("myfs://bucket/path/data.foo", engine="prism")

Development

Setup Development Environment

# Start test services (MinIO, THREDDS)
docker-compose -f dev-env/docker-compose.yaml up -d --remove-orphans

# Create conda environment
conda create -n xarray-prism python=3.12 -y
conda activate xarray-prism

# Install package in editable mode with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run tests
tox -e test

# Run with coverage
tox -e test-cov

# Lint
tox -e lint

# Type checking
tox -e types

# Auto-format code
tox -e format

Creating a Release

Releases are managed via GitHub Actions and tox:

# Tag a new release (creates git tag)
tox -e release

The release workflow is triggered automatically when:

  • A version tag (v*.*.*) is pushed -> Full release to PyPI
  • Manual workflow dispatch with RC number -> Pre-release to PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray_prism-2602.1.0.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xarray_prism-2602.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file xarray_prism-2602.1.0.tar.gz.

File metadata

  • Download URL: xarray_prism-2602.1.0.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xarray_prism-2602.1.0.tar.gz
Algorithm Hash digest
SHA256 ef984c072041571e91c58e8746764adc2d4f221b98ad5aa3d174c0eea997e49b
MD5 2bd442b75210f66028e3f4517f05920c
BLAKE2b-256 00f61beb35e35b8ee0c7c1540cef8bd85a41c13e17a99343f5dd7509b64f83b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for xarray_prism-2602.1.0.tar.gz:

Publisher: release_ci.yml on freva-org/xarray-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xarray_prism-2602.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for xarray_prism-2602.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b866bf794d1c3cd3362a42439bcb90863b84f0a1e7bfc66687be66788b50e57
MD5 b8ad72c0a58de2a0195e3113c3842acf
BLAKE2b-256 8632384805f5a74827280ee7dc7e6974d865fed50f3c96654a9286727f40be9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xarray_prism-2602.1.0-py3-none-any.whl:

Publisher: release_ci.yml on freva-org/xarray-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page