Skip to main content

A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.

Project description

Xarray Prism Engine

A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.

[!Important] If you encounter with a data formats that prism engine is not able to open, please files an issue report here. This helps us to improve the engine enabling users work with different kinds of climate data.

Installation

Install via PyPI

pip install xarray-prism

Install via Conda

conda install xarray-prism

Quick Start

Using with xarray

import xarray as xr

# Auto-detect format
ds = xr.open_dataset("my_data.unknown_fmt", engine="prism")

# Remote Zarr on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.zarr",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF3 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote NetCDF4 on S3
ds = xr.open_dataset(
    "s3://freva/workshop/tas.nc4",
    engine="prism",
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
        }
    }
)

# Remote Zarr on S3 - non-anon
ds = xr.open_dataset(
    "s3://bucket/data.zarr",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)

# OPeNDAP from THREDDS
ds = xr.open_dataset(
    "https://icdc.cen.uni-hamburg.de/thredds/dodsC/ftpthredds/ar5_sea_level_rise/gia_mean.nc",
    engine="prism"
)

# Local GRIB file
ds = xr.open_dataset("forecast.grib2", engine="prism")

# GeoTIFF
ds = xr.open_dataset("satellite.tif", engine="prism")

# tip: Handle the cache manually by yourself
xr.open_dataset(
    "simplecache::s3://bucket/file.nc3",
    engine="prism",
    storage_options={
        "s3": {"anon": True, "client_kwargs": {"endpoint_url": "..."}},
        "simplecache": {"cache_storage": "/path/to/cache"}
    }
)

# Even for the tif format on the S3 you can pass the credential through
# storage_options which is not supported by rasterio:
xr.open_dataset(
    "s3://bucket/file.tif",
    engine="prism",
    storage_options={
        "key": "YOUR_KEY",
        "secret": "YOUR_SECRET",
        "client_kwargs": {
            "endpoint_url": "S3_ENDPOINT"
        }
    }
)

Supported Formats

Data format Remote backend Local FS Cache
GRIB cfgrib + fsspec cfgrib fsspec simplecache (full-file)
Zarr zarr + fsspec zarr chunked key/value store
NetCDF3 scipy + fsspec scipy fsspec byte cache (5 MB blocks but full dowload)
NetCDF4/HDF5 h5netcdf + fsspec h5netcdf fsspec byte cache (5 MB block)
GeoTIFF rasterio + fsspec rasterio GDAL/rasterio block cache (5 MB block)
OPeNDAP/DODS netCDF4 n/a n/a

[!WARNING] Remote GRIB & NetCDF3 require full file download

Unlike Zarr or HDF5, these formats don't support partial/chunk reads over the network.

By default, xarray-prism caches files in the system temp directory. This works well for most cases. If temp storage is a concern (e.g., limited space or cleared on reboot), you can specify a persistent cache:

Option How
Environment variable export XARRAY_PRISM_CACHE=/path/to/cache
Per-call storage_options={"simplecache": {"cache_storage": "/path"}}
Default System temp directory

Cache management

You can inspect or evict the cache manually:

import xarray_prism as xp

xp.cache_info()
# {'files': 12, 'size_bytes': 2400000000, 'path': '/tmp/xarray-prism-cache'}

# Preview what would be removed
xp.clear_cache(dry_run=True)

# Evict with custom thresholds
xp.clear_cache(max_age_days=3, max_size_gb=2)

# Remove everything
xp.clear_cache(max_age_days=0, max_size_gb=0)

[!NOTE] max_age_days and max_size_gb can also be set via the following environment variables:

Policy Default Override
TTL (last-access) 7 days XARRAY_PRISM_MAX_AGE_DAYS=N
Size cap (LRU) 10 GB XARRAY_PRISM_MAX_SIZE_GB=N

Customization

Custom Format Detectors and URI Types

You can extend xarray-prism with custom format detectors, URI types, and open handlers by providing a small plugin package. Registration happens at import time, so importing the plugin activates it.

Plugin structure

xarray_prism_myplugin/
  __init__.py   # imports the plugin module (triggers registration)
  plugin.py     # detectors, URI types, and open handlers
pyproject.toml

Plugin implementation

xarray_prism_myplugin/__init__.py

from .plugin import *  # noqa: F401,F403

xarray_prism_myplugin/plugin.py

import xarray as xr
from xarray_prism import register_detector, register_uri_type, registry


@register_uri_type(priority=100)
def detect_myfs_uri(uri: str):
    """Detect a custom filesystem URI."""
    if uri.lower().startswith("myfs://"):
        return "myfs"
    return None


@register_detector(priority=100)
def detect_foo_format(uri: str):
    """Detect a custom file format."""
    if uri.lower().endswith(".foo"):
        return "foo"
    return None


@registry.register("foo", uri_type="myfs")
def open_foo_from_myfs(uri: str, **kwargs):
    """Open .foo files from myfs:// URIs."""
    translated = uri.replace("myfs://", "https://my-gateway.example/")
    return xr.open_dataset(translated, engine="h5netcdf", **kwargs)

Plugin installation

pyproject.toml

[project]
name = "xarray-prism-myplugin"
version = "0.1.0"
dependencies = ["xarray-prism"]

[project.entry-points."xarray_prism.plugins"]
myplugin = "xarray_prism_myplugin"

Using the plugin

After installing the plugin package, import it once to activate the registrations:

import xarray_prism_myplugin  # activates detectors and handlers

import xarray as xr
ds = xr.open_dataset("myfs://bucket/path/data.foo", engine="prism")

Development

Setup Development Environment

# Start test services (MinIO, THREDDS)
docker-compose -f dev-env/docker-compose.yaml up -d --remove-orphans

# Create conda environment
conda create -n xarray-prism python=3.12 -y
conda activate xarray-prism

# Install package in editable mode with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run tests
tox -e test

# Run with coverage
tox -e test-cov

# Lint
tox -e lint

# Type checking
tox -e types

# Auto-format code
tox -e format

Creating a Release

Releases are managed via GitHub Actions and tox:

# Tag a new release (creates git tag)
tox -e release

The release workflow is triggered automatically when:

  • A version tag (v*.*.*) is pushed -> Full release to PyPI
  • Manual workflow dispatch with RC number -> Pre-release to PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray_prism-2603.0.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xarray_prism-2603.0.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file xarray_prism-2603.0.0.tar.gz.

File metadata

  • Download URL: xarray_prism-2603.0.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xarray_prism-2603.0.0.tar.gz
Algorithm Hash digest
SHA256 ad90015ecad22729c182d1d8068d6632d876c549df78faba1e2daf39f33e3524
MD5 87d44cef681e4660505a36a06a5f9b37
BLAKE2b-256 171249e6894eea3bffe01f01db85d122e0de940a5510564e522158f10c673f97

See more details on using hashes here.

Provenance

The following attestation bundles were made for xarray_prism-2603.0.0.tar.gz:

Publisher: release_ci.yml on freva-org/xarray-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xarray_prism-2603.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for xarray_prism-2603.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e4b167f4dec02112e50245ea36905f2e3441e63c57c9eb22da81390dd6e53a0
MD5 d173d3c36449521f160c965d708d69b5
BLAKE2b-256 5f008c928b5350152cbe4ae4a3683ab540deaca8744bdf30e9b6cc0d53ed7eb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for xarray_prism-2603.0.0-py3-none-any.whl:

Publisher: release_ci.yml on freva-org/xarray-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page