Interface for using cubed with xarray for parallel computation.
Project description
Note: this is a proof-of-concept, and many things are incomplete, untested, or don't work.
cubed-xarray
Interface for using cubed with xarray.
Requirements
- Cubed version >=0.14.2
- Xarray version >=2024.02.0
Installation
Install via pip
pip install cubed-xarray
or conda
conda install -c conda-forge cubed-xarray
Importing
You don't need to import this package in user code. Once poperly installed, xarray should automatically become aware of this package via the magic of entrypoints.
Usage
Xarray objects backed by cubed arrays can be created either by:
- Passing existing
cubed.Array
objects to thedata
argument of xarray constructors, - Calling
.chunk
on xarray objects, - Passing a
chunks
argument toxarray.open_dataset
.
In (2) and (3) the choice to use cubed.Array
instead of dask.array.Array
is made by passing the keyword argument chunked_array_type='cubed'
.
To pass arguments to the constructor of cubed.Array
you should pass them via the dictionary from_array_kwargs
, e.g. from_array_kwargs={'spec': cubed.Spec(allowed_mem='2GB')}
.
If cubed and cubed-xarray are installed but dask is not, then specifying chunked_array_type
is not necessary,
as the entrypoints system will then default to the only chunked parallel backend available (i.e. cubed).
Sharp Edges 🔪
Some things almost certainly won't work yet:
- Certain operations called in xarray but not implemented in cubed, for instance
pad
(see https://github.com/tomwhite/cubed/issues/193) - Array operations involving NaNs - for now use
skipna=True
to avoid eager loading (see https://github.com/pydata/xarray/issues/7243) - Using
parallel=True
withxr.open_mfdataset
won't work because cubed doesn't implement a version ofdask.Delayed
(see https://github.com/pydata/xarray/issues/7810) - Groupby (see https://github.com/tomwhite/cubed/issues/223 and https://github.com/xarray-contrib/flox/issues/224)
xarray.map_blocks
does not actually dispatch tocubed.map_blocks
yet, and will always use Dask.- Certain operations using
cumreduction
(e.g.ffill
andbfill
) are not hooked up to theChunkManager
yet, so will attempt to call dask.
and some other things might work but have not yet been tried:
- Saving to formats other than zarr
In general a bug could take the form of an error, or of a silent attempt to coerce the array type to numpy by immediately computing the underlying array.
Tests
Integration tests for wrapping cubed with xarray also live in this repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cubed_xarray-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a990411e0df718b0f603a257de8bcce18835ab9588be9d5ede531bb23c975377 |
|
MD5 | 5cb905af72f29d8ee6374bee51bc5f4e |
|
BLAKE2b-256 | f5eadc42dba968d75581560df3c2a22c01b1f3337dca63491ffd4dcada5b872c |