Skip to main content

xarray extension for DataArray and Dataset classes

Project description

xarray-dataclasses

PyPI Python Test License

xarray extension for DataArray and Dataset classes

TL;DR

xarray-dataclasses is a Python package for creating DataArray and Dataset classes in the same manner as the Python's native dataclass. Here is an example code of what the package provides:

from xarray_dataclasses import Coord, Data, dataarrayclass


@dataarrayclass
class Image:
    """DataArray that represents an image."""

    data: Data[tuple['x', 'y'], float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0


# create a DataArray instance
image = Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])


# create a DataArray instance filled with ones
ones = Image.ones((2, 2), x=[0, 1], y=[0, 1])

Features

  • DataArray or Dataset instances with fixed dimensions, data type, and coordinates can easily be created.
  • NumPy-like special functions such as ones() are provided as class methods.
  • 100% compatible with the Python's native dataclass.
  • 100% compatible with static type check by Pyright.

Installation

$ pip install xarray-dataclasses

Introduction

xarray is useful for handling labeled multi-dimensional data, but it is a bit troublesome to create a DataArray or Dataset instance with fixed dimensions, data type, or coordinates. For example, let us think about the following specifications of DataArray instances:

  • Dimensions of data must be ('x', 'y').
  • Data type of data must be float.
  • Data type of dimensions must be int.
  • Default value of dimensions must be 0.

Then a function to create a spec-compliant DataArray instance is something like this:

import numpy as np
import xarray as xr


def spec_dataarray(data, x=None, y=None):
    """Create a spec-comliant DataArray instance."""
    data = np.array(data)

    if x is None:
        x = np.zeros(data.shape[0])
    else:
        x = np.array(x)

    if y is None:
        y = np.zeros(data.shape[1])
    else:
        y = np.array(y)

    return xr.DataArray(
        data=data.astype(float),
        dims=('x', 'y'),
        coords={
            'x': ('x', x.astype(int)),
            'y': ('y', y.astype(int)),
        },
    )


dataarray = spec_dataarray([[0, 1], [2, 3]])

The issues are (1) it is hard to figure out the specs from the code and (2) it is hard to reuse the code, for example, to add a new coordinate to the original specs.

xarray-dataclasses resolves them by defining the specs as a dataclass with dedicated type hints:

from xarray_dataclasses import Coord, Data, dataarrayclass


@dataarrayclass
class Specs:
    data: Data[tuple['x', 'y'], float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0


dataarray = Specs.new([[0, 1], [2, 3]])

The specs are now much easier to read: The type hints, Data[<dims>, <dtype>] and Coord[<dims>, <dtype>], have complete information of DataArray creation. The default values are given as class variables.

The class decorator, @dataarrayclass, converts a class to the Python's native dataclass and add class methods such as new() to it. The extension of the specs is then easy by class inheritance.

Basic usage

xarray-dataclasses uses the Python's native dataclass (please learn how to use it before proceeding). Data (or data variables), coordinates, attribute members, and name of a DataArray or Dataset instance are defined as dataclass fields with the following dedicated type hints.

Data

Data[<dims>, <dtype>] specifies the field whose value will become the data of a DataArray instance or a member of the data variables of a Dataset instance. It accepts two type variables, <dims> and <dtype>, for fixing dimensions and data type, respectively. For example:

Type hint Inferred dims Inferred dtype
Data['x', typing.Any] ('x',) None
Data['x', int] ('x',) numpy.dtype('int64')
Data['x', float] ('x',) numpy.dtype('float64')
Data[tuple['x', 'y'], float] ('x', 'y') numpy.dtype('float64')

Note: for Python 3.7 and 3.8, use typing.Tuple[...] instead of tuple[...].

Coord

Coord[<dims>, <dtypes>] specifies the field whose value will become a coordinate of a DataArray or Dataset instance. Similar to Data, it accepts two type variables, <dims> and <dtype>, for fixing dimensions and data type, respectively.

Attr

Attr[<type>] specifies the field whose value will become a member of the attributes (attrs) of a DataArray or Dataset instance. It accepts a type variable, <type>, for specifying the type of the value. For example:

@dataarrayclass
class Specs:
    units: Attr[str] = 'm/s'  # equivalent to str

Name

Name[<type>] specifies the field whose value will become the name of a DataArray. It accepts a type variable, <type>, for specifying the type of the value. For example:

@dataarrayclass
class Specs:
    name: Name[str] = 'default'  # equivalent to str

DataArray class

DataArray class is a dataclass that defines DataArray creation. For example:

from xarray_dataclasses import Attr, Coord, Data, Name, dataarrayclass


@dataarrayclass
class Image:
    """DataArray that represents an image."""

    data: Data[tuple['x', 'y'], float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0
    dpi: Attr[int] = 300
    name: Name[str] = 'default'

where exactly one Data-typed field is allowed. ValueError is raised if more than two Data-type fields exist. A spec-compliant DataArray instance is created by a shorthand method, new():

Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])

<xarray.DataArray 'default' (x: 2, y: 2)>
array([[0., 1.],
       [2., 3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1
Attributes:
    dpi:      300

DataArray class has NumPy-like empty(), zeros(), ones(), full() methods:

Image.ones((3, 3), dpi=200, name='flat')

<xarray.DataArray 'flat' (x: 3, y: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * x        (x) int64 0 0 0
  * y        (y) int64 0 0 0
Attributes:
    dpi:      200

Dataset class

Dataset class is a dataclass that defines Dataset creation. For example:

from xarray_dataclasses import Attr, Coord, Data, datasetclass


@datasetclass
class RGBImage:
    """Dataset that represents a three-color image."""

    red: Data[tuple['x', 'y'], float]
    green: Data[tuple['x', 'y'], float]
    blue: Data[tuple['x', 'y'], float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0
    dpi: Attr[int] = 300

where multiple Data-typed fields are allowed. A spec-compliant Dataset instance is created by a shorthand method, new():

RGBImage.new(
    [[0, 0], [0, 0]],  # red
    [[1, 1], [1, 1]],  # green
    [[2, 2], [2, 2]],  # blue
)

<xarray.Dataset>
Dimensions:  (x: 2, y: 2)
Coordinates:
  * x        (x) int64 0 0
  * y        (y) int64 0 0
Data variables:
    red      (x, y) float64 0.0 0.0 0.0 0.0
    green    (x, y) float64 1.0 1.0 1.0 1.0
    blue     (x, y) float64 2.0 2.0 2.0 2.0
Attributes:
    dpi:      300

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray-dataclasses-0.3.1.tar.gz (12.0 kB view hashes)

Uploaded Source

Built Distribution

xarray_dataclasses-0.3.1-py3-none-any.whl (11.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page