A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.
Project description
Xarray Prism Engine
A multi-format and multi-storage xarray engine with automatic engine detection, and ability to register new data format and uri type for climate data.
[!Important] If you encounter with a data formats that
prismengine is not able to open, please files an issue report here. This helps us to improve the engine enabling users work with different kinds of climate data.
Installation
Install via PyPI
pip install xarray-prism
Install via Conda
conda install xarray-prism
Quick Start
Using with xarray
import xarray as xr
# Auto-detect format
ds = xr.open_dataset("my_data.unknown_fmt", engine="prism")
# Remote Zarr on S3
ds = xr.open_dataset(
"s3://freva/workshop/tas.zarr",
engine="prism",
storage_options={
"anon": True,
"client_kwargs": {
"endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
}
}
)
# Remote NetCDF3 on S3
ds = xr.open_dataset(
"s3://freva/workshop/tas.nc",
engine="prism",
storage_options={
"anon": True,
"client_kwargs": {
"endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
}
}
)
# Remote NetCDF4 on S3
ds = xr.open_dataset(
"s3://freva/workshop/tas.nc4",
engine="prism",
storage_options={
"anon": True,
"client_kwargs": {
"endpoint_url": "https://s3.eu-dkrz-1.dkrz.cloud"
}
}
)
# Remote Zarr on S3 - non-anon
ds = xr.open_dataset(
"s3://bucket/data.zarr",
engine="prism",
storage_options={
"key": "YOUR_KEY",
"secret": "YOUR_SECRET",
"client_kwargs": {
"endpoint_url": "S3_ENDPOINT"
}
}
)
# OPeNDAP from THREDDS
ds = xr.open_dataset(
"https://icdc.cen.uni-hamburg.de/thredds/dodsC/ftpthredds/ar5_sea_level_rise/gia_mean.nc",
engine="prism"
)
# Local GRIB file
ds = xr.open_dataset("forecast.grib2", engine="prism")
# GeoTIFF
ds = xr.open_dataset("satellite.tif", engine="prism")
# tip: Handle the cache manually by yourself
xr.open_dataset(
"simplecache::s3://bucket/file.nc3",
engine="prism",
storage_options={
"s3": {"anon": True, "client_kwargs": {"endpoint_url": "..."}},
"simplecache": {"cache_storage": "/path/to/cache"}
}
)
# Even for the tif format on the S3 you can pass the credential through
# storage_options which is not supported by rasterio:
xr.open_dataset(
"s3://bucket/file.tif",
engine="prism",
storage_options={
"key": "YOUR_KEY",
"secret": "YOUR_SECRET",
"client_kwargs": {
"endpoint_url": "S3_ENDPOINT"
}
}
)
Supported Formats
| Data format | Remote backend | Local FS | Cache |
|---|---|---|---|
| GRIB | cfgrib + fsspec | cfgrib | fsspec simplecache (full-file) |
| Zarr | zarr + fsspec | zarr | chunked key/value store |
| NetCDF3 | scipy + fsspec | scipy | fsspec byte cache (5 MB blocks but full dowload) |
| NetCDF4/HDF5 | h5netcdf + fsspec | h5netcdf | fsspec byte cache (5 MB block) |
| GeoTIFF | rasterio + fsspec | rasterio | GDAL/rasterio block cache (5 MB block) |
| OPeNDAP/DODS | netCDF4 | n/a | n/a |
[!WARNING] Remote GRIB & NetCDF3 require full file download
Unlike Zarr or HDF5, these formats don't support partial/chunk reads over the network.
By default, xarray-prism caches files in the system temp directory. This works well for most cases. If temp storage is a concern (e.g., limited space or cleared on reboot), you can specify a persistent cache:
Option How Environment variable export XARRAY_PRISM_CACHE=/path/to/cachePer-call storage_options={"simplecache": {"cache_storage": "/path"}}Default System temp directory
Cache management
You can inspect or evict the cache manually:
import xarray_prism as xp
xp.cache_info()
# {'files': 12, 'size_bytes': 2400000000, 'path': '/tmp/xarray-prism-cache'}
# Preview what would be removed
xp.clear_cache(dry_run=True)
# Evict with custom thresholds
xp.clear_cache(max_age_days=3, max_size_gb=2)
# Remove everything
xp.clear_cache(max_age_days=0, max_size_gb=0)
[!NOTE]
max_age_daysandmax_size_gbcan also be set via the following environment variables:
Policy Default Override TTL (last-access) 7 days XARRAY_PRISM_MAX_AGE_DAYS=NSize cap (LRU) 10 GB XARRAY_PRISM_MAX_SIZE_GB=N
Customization
Custom Format Detectors and URI Types
You can extend xarray-prism with custom format detectors, URI types, and open handlers by providing a small plugin package. Registration happens at import time, so importing the plugin activates it.
Plugin structure
xarray_prism_myplugin/
__init__.py # imports the plugin module (triggers registration)
plugin.py # detectors, URI types, and open handlers
pyproject.toml
Plugin implementation
xarray_prism_myplugin/__init__.py
from .plugin import * # noqa: F401,F403
xarray_prism_myplugin/plugin.py
import xarray as xr
from xarray_prism import register_detector, register_uri_type, registry
@register_uri_type(priority=100)
def detect_myfs_uri(uri: str):
"""Detect a custom filesystem URI."""
if uri.lower().startswith("myfs://"):
return "myfs"
return None
@register_detector(priority=100)
def detect_foo_format(uri: str):
"""Detect a custom file format."""
if uri.lower().endswith(".foo"):
return "foo"
return None
@registry.register("foo", uri_type="myfs")
def open_foo_from_myfs(uri: str, **kwargs):
"""Open .foo files from myfs:// URIs."""
translated = uri.replace("myfs://", "https://my-gateway.example/")
return xr.open_dataset(translated, engine="h5netcdf", **kwargs)
Plugin installation
pyproject.toml
[project]
name = "xarray-prism-myplugin"
version = "0.1.0"
dependencies = ["xarray-prism"]
[project.entry-points."xarray_prism.plugins"]
myplugin = "xarray_prism_myplugin"
Using the plugin
After installing the plugin package, import it once to activate the registrations:
import xarray_prism_myplugin # activates detectors and handlers
import xarray as xr
ds = xr.open_dataset("myfs://bucket/path/data.foo", engine="prism")
Development
Setup Development Environment
# Start test services (MinIO, THREDDS)
docker-compose -f dev-env/docker-compose.yaml up -d --remove-orphans
# Create conda environment
conda create -n xarray-prism python=3.12 -y
conda activate xarray-prism
# Install package in editable mode with dev dependencies
pip install -e ".[dev]"
Running Tests
# Run tests
tox -e test
# Run with coverage
tox -e test-cov
# Lint
tox -e lint
# Type checking
tox -e types
# Auto-format code
tox -e format
Creating a Release
Releases are managed via GitHub Actions and tox:
# Tag a new release (creates git tag)
tox -e release
The release workflow is triggered automatically when:
- A version tag (
v*.*.*) is pushed -> Full release to PyPI - Manual workflow dispatch with RC number -> Pre-release to PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xarray_prism-2603.0.0.tar.gz.
File metadata
- Download URL: xarray_prism-2603.0.0.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad90015ecad22729c182d1d8068d6632d876c549df78faba1e2daf39f33e3524
|
|
| MD5 |
87d44cef681e4660505a36a06a5f9b37
|
|
| BLAKE2b-256 |
171249e6894eea3bffe01f01db85d122e0de940a5510564e522158f10c673f97
|
Provenance
The following attestation bundles were made for xarray_prism-2603.0.0.tar.gz:
Publisher:
release_ci.yml on freva-org/xarray-prism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xarray_prism-2603.0.0.tar.gz -
Subject digest:
ad90015ecad22729c182d1d8068d6632d876c549df78faba1e2daf39f33e3524 - Sigstore transparency entry: 1161701474
- Sigstore integration time:
-
Permalink:
freva-org/xarray-prism@1d94724110891b013b9b0c57657fe1b18f59534d -
Branch / Tag:
refs/tags/v2603.0.0 - Owner: https://github.com/freva-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release_ci.yml@1d94724110891b013b9b0c57657fe1b18f59534d -
Trigger Event:
push
-
Statement type:
File details
Details for the file xarray_prism-2603.0.0-py3-none-any.whl.
File metadata
- Download URL: xarray_prism-2603.0.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e4b167f4dec02112e50245ea36905f2e3441e63c57c9eb22da81390dd6e53a0
|
|
| MD5 |
d173d3c36449521f160c965d708d69b5
|
|
| BLAKE2b-256 |
5f008c928b5350152cbe4ae4a3683ab540deaca8744bdf30e9b6cc0d53ed7eb6
|
Provenance
The following attestation bundles were made for xarray_prism-2603.0.0-py3-none-any.whl:
Publisher:
release_ci.yml on freva-org/xarray-prism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xarray_prism-2603.0.0-py3-none-any.whl -
Subject digest:
9e4b167f4dec02112e50245ea36905f2e3441e63c57c9eb22da81390dd6e53a0 - Sigstore transparency entry: 1161701577
- Sigstore integration time:
-
Permalink:
freva-org/xarray-prism@1d94724110891b013b9b0c57657fe1b18f59534d -
Branch / Tag:
refs/tags/v2603.0.0 - Owner: https://github.com/freva-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release_ci.yml@1d94724110891b013b9b0c57657fe1b18f59534d -
Trigger Event:
push
-
Statement type: