xarray extension for DataArray and Dataset classes
Project description
xarray-dataclasses
xarray extension for DataArray and Dataset classes
TL;DR
xarray-dataclasses is a Python package for creating DataArray and Dataset classes in the same manner as the Python's native dataclass. Here is an example code of what the package provides:
from xarray_dataclasses import Coord, Data, dataarrayclass
@dataarrayclass
class Image:
"""DataArray that represents an image."""
data: Data[tuple['x', 'y'], float]
x: Coord['x', int] = 0
y: Coord['y', int] = 0
# create a DataArray instance
image = Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
# create a DataArray instance filled with ones
ones = Image.ones((2, 2), x=[0, 1], y=[0, 1])
Features
- DataArray or Dataset instances with fixed dimensions, data type, and coordinates can easily be created.
- NumPy-like special functions such as
ones()
are provided as class methods. - 100% compatible with the Python's native dataclass.
- 100% compatible with static type check by Pyright.
Installation
$ pip install xarray-dataclasses
Introduction
xarray is useful for handling labeled multi-dimensional data, but it is a bit troublesome to create a DataArray or Dataset instance with fixed dimensions, data type, or coordinates. For example, let us think about the following specifications of DataArray instances:
- Dimensions of data must be
('x', 'y')
. - Data type of data must be
float
. - Data type of dimensions must be
int
. - Default value of dimensions must be
0
.
Then a function to create a spec-compliant DataArray instance is something like this:
import numpy as np
import xarray as xr
def spec_dataarray(data, x=None, y=None):
"""Create a spec-comliant DataArray instance."""
data = np.array(data)
if x is None:
x = np.zeros(data.shape[0])
else:
x = np.array(x)
if y is None:
y = np.zeros(data.shape[1])
else:
y = np.array(y)
return xr.DataArray(
data=data.astype(float),
dims=('x', 'y'),
coords={
'x': ('x', x.astype(int)),
'y': ('y', y.astype(int)),
},
)
dataarray = spec_dataarray([[0, 1], [2, 3]])
The issues are (1) it is hard to figure out the specs from the code and (2) it is hard to reuse the code, for example, to add a new coordinate to the original specs.
xarray-dataclasses resolves them by defining the specs as a dataclass with dedicated type hints:
from xarray_dataclasses import Coord, Data, dataarrayclass
@dataarrayclass
class Specs:
data: Data[tuple['x', 'y'], float]
x: Coord['x', int] = 0
y: Coord['y', int] = 0
dataarray = Specs.new([[0, 1], [2, 3]])
The specs are now much easier to read:
The type hints, Data[<dims>, <dtype>]
and Coord[<dims>, <dtype>]
, have complete information of DataArray creation.
The default values are given as class variables.
The class decorator, @dataarrayclass
, converts a class to the Python's native dataclass and add class methods such as new()
to it.
The extension of the specs is then easy by class inheritance.
Basic usage
xarray-dataclasses uses the Python's native dataclass (please learn how to use it before proceeding). Data (or data variables), coordinates, attribute members, and name of a DataArray or Dataset instance are defined as dataclass fields with the following dedicated type hints.
Data
Data[<dims>, <dtype>]
specifies the field whose value will become the data of a DataArray instance or a member of the data variables of a Dataset instance.
It accepts two type variables, <dims>
and <dtype>
, for fixing dimensions and data type, respectively.
For example:
Type hint | Inferred dims | Inferred dtype |
---|---|---|
Data['x', typing.Any] |
('x',) |
None |
Data['x', int] |
('x',) |
numpy.dtype('int64') |
Data['x', float] |
('x',) |
numpy.dtype('float64') |
Data[tuple['x', 'y'], float] |
('x', 'y') |
numpy.dtype('float64') |
Note: for Python 3.7 and 3.8, use typing.Tuple[...]
instead of tuple[...]
.
Coord
Coord[<dims>, <dtypes>]
specifies the field whose value will become a coordinate of a DataArray or Dataset instance.
Similar to Data
, it accepts two type variables, <dims>
and <dtype>
, for fixing dimensions and data type, respectively.
Attr
Attr[<type>]
specifies the field whose value will become a member of the attributes (attrs) of a DataArray or Dataset instance.
It accepts a type variable, <type>
, for specifying the type of the value.
For example:
@dataarrayclass
class Specs:
units: Attr[str] = 'm/s' # equivalent to str
Name
Name[<type>]
specifies the field whose value will become the name of a DataArray.
It accepts a type variable, <type>
, for specifying the type of the value.
For example:
@dataarrayclass
class Specs:
name: Name[str] = 'default' # equivalent to str
DataArray class
DataArray class is a dataclass that defines DataArray creation. For example:
from xarray_dataclasses import Attr, Coord, Data, Name, dataarrayclass
@dataarrayclass
class Image:
"""DataArray that represents an image."""
data: Data[tuple['x', 'y'], float]
x: Coord['x', int] = 0
y: Coord['y', int] = 0
dpi: Attr[int] = 300
name: Name[str] = 'default'
where exactly one Data
-typed field is allowed.
ValueError
is raised if more than two Data
-type fields exist.
A spec-compliant DataArray instance is created by a shorthand method, new()
:
Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
<xarray.DataArray 'default' (x: 2, y: 2)>
array([[0., 1.],
[2., 3.]])
Coordinates:
* x (x) int64 0 1
* y (y) int64 0 1
Attributes:
dpi: 300
DataArray class has NumPy-like empty()
, zeros()
, ones()
, full()
methods:
Image.ones((3, 3), dpi=200, name='flat')
<xarray.DataArray 'flat' (x: 3, y: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* x (x) int64 0 0 0
* y (y) int64 0 0 0
Attributes:
dpi: 200
Dataset class
Dataset class is a dataclass that defines Dataset creation. For example:
from xarray_dataclasses import Attr, Coord, Data, datasetclass
@datasetclass
class RGBImage:
"""Dataset that represents a three-color image."""
red: Data[tuple['x', 'y'], float]
green: Data[tuple['x', 'y'], float]
blue: Data[tuple['x', 'y'], float]
x: Coord['x', int] = 0
y: Coord['y', int] = 0
dpi: Attr[int] = 300
where multiple Data
-typed fields are allowed.
A spec-compliant Dataset instance is created by a shorthand method, new()
:
RGBImage.new(
[[0, 0], [0, 0]], # red
[[1, 1], [1, 1]], # green
[[2, 2], [2, 2]], # blue
)
<xarray.Dataset>
Dimensions: (x: 2, y: 2)
Coordinates:
* x (x) int64 0 0
* y (y) int64 0 0
Data variables:
red (x, y) float64 0.0 0.0 0.0 0.0
green (x, y) float64 1.0 1.0 1.0 1.0
blue (x, y) float64 2.0 2.0 2.0 2.0
Attributes:
dpi: 300
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for xarray_dataclasses-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99c81e9219f0add4a63b2ab3174875571a82c547d93255c1326f339e89f4ca5d |
|
MD5 | 93093bbc7c5748de29b1bb1d6f7ea3d7 |
|
BLAKE2b-256 | 2c93766552c204f30bea28281395b2482a35ff80c78e9e8e0e1bae45c9af9984 |