Abstract NetCDF data objects, providing fast data transfer between analysis packages.
Project description
ncdata
Generic NetCDF data in Python.
Provides fast data exchange between analysis packages, and full control of storage formatting.
Especially : Ncdata exchanges data between Xarray and Iris as efficiently as possible
"lossless, copy-free and lazy-preserving".
This enables the user to freely mix+match operations from both projects, getting the "best of both worlds".
import xarray
import ncdata.iris_xarray as nci
import iris.quickplot as qpltds = xarray.open_dataset(filepath)
ds_resample = ds.rolling(time=3).mean()
cubes = nci.cubes_from_xarray(ds_resample)
temp_cube = cubes.extract_cube("air_temperature")
qplt.contourf(temp_cube[0])
Contents
- Motivation
- Principles
- Working Usage Examples
- API documentation
- Installation
- Project Status
- References
- Developer Notes
Motivation
Primary Use
Fast and efficient translation of data between Xarray and Iris objects.
This allows the user to mix+match features from either package in code.
For example:
from ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray
# Apply Iris regridder to xarray data
dataset = xarray.open_dataset("file1.nc", chunks="auto")
(cube,) = cubes_from_xarray(dataset)
cube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)
dataset2 = cubes_to_xarray(cube2)
# Apply Xarray statistic to Iris data
cubes = iris.load("file1.nc")
dataset = cubes_to_xarray(cubes)
dataset2 = dataset.group_by("time.dayofyear").argmin()
cubes2 = cubes_from_xarray(dataset2)
- data conversion is equivalent to writing to a file with one library, and reading it
back with the other ..
- .. except that no actual files are written
- both real (numpy) and lazy (dask) variable data arrays are transferred directly, without copying or computing
Secondary Uses
Exact control of file formatting
Ncdata can also be used as a transfer layer between Iris or Xarray file i/o and the
exact format of data stored in files.
I.E. adjustments can be made to file data before loading it into Iris/Xarray; or
Iris/Xarray saved output can be adjusted before writing to a file.
This allows the user to workaround any package limitations in controlling storage aspects such as : data chunking; reserved attributes; missing-value processing; or dimension control.
For example:
from ncdata.xarray import from_xarray
from ncdata.iris import to_iris
from ncdata.netcdf4 import to_nc4, from_nc4
# Rename a dimension in xarray output
dataset = xr.open_dataset("file1.nc")
xr_ncdata = from_xarray(dataset)
dim = xr_ncdata.dimensions.pop("dim0")
dim.name = "newdim"
xr_ncdata.dimensions["newdim"] = dim
for var in xr_ncdata.variables.values():
var.dimensions = ["newdim" if dim == "dim0" else dim for dim in var.dimensions]
to_nc4(ncdata, "file_2a.nc")
# Fix chunking in Iris input
ncdata = from_nc4("file1.nc")
for var in ncdata.variables:
# custom chunking() mimics the file chunks we want
var.chunking = lambda: (100.0e6 if dim == "dim0" else -1 for dim in var.dimensions)
cubes = to_iris(ncdata)
Manipulation of data
ncdata can also be used for data extraction and modification, similar to the scope of
CDO and NCO command-line operators but without file operations.
However, this type of usage is as yet still undeveloped : There is no inbuilt support
for data consistency checking, or obviously useful operations such as indexing by
dimension.
This could be added in future, but it is also true that many such operations (like
indexing) may be better done using Iris/Xarray.
Principles
- ncdata represents NetCDF data as Python objects
- ncdata objects can be freely manipulated, independent of any data file
- ncdata variables can contain either real (numpy) or lazy (Dask) arrays
- ncdata can be losslessly converted to and from actual NetCDF files
- Iris or Xarray objects can be converted to and from ncdata, in the same way that they are read from and saved to NetCDF files
- translation between Xarray and Iris is based on conversion to ncdata, which
is in turn equivalent to file i/o
- thus, Iris/Xarray translation is equivalent to saving from one package into a file, then loading the file in the other package
- ncdata exchanges variable data directly with Iris/Xarray, with no copying of real data or computing of lazy data
- ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing transfer of arrays larger than memory
Code Examples
- mostly TBD
- proof-of-concept script for netCDF4 file i/o
- proof-of-concept script for iris-xarray conversions
API documentation
- see the ReadTheDocs build
Installation
Install from conda-forge with conda
conda install -c conda-forge ncdata
Or from PyPI with pip
pip install ncdata
Project Status
Code Stability
We intend to follow PEP 440 or (older) SemVer versioning principles.
Minor release version is at "v0.1".
This is a first complete implementation, with functional operational of all public APIs.
The code is however still experimental, and APIs are not stable (hence no major version yet).
Change Notes
v0.1.1
Small tweaks + bug fixes.
Note: #62 and #59 are important fixes to achieve intended performance goals,
i.e. moving arbitrarily large data via Dask without running out of memory.
- Stop non-numpy attribute values from breaking attribute printout. #63
- Stop
ncdata.iris.from_iris()
consuming full data memory for each variable. #62 - Provide convenience APIs for ncdata component dictionaries and attribute values. #61
- Use dask
chunks="auto"
inncdata.netcdf4.from_nc4()
. #59
v0.1.0
First release
Iris and Xarray Compatibility
- C.I. tests GitHub PRs and merges, against latest releases of Iris and Xarray
- compatible with iris >= v3.7.0
- see : support added in v3.7.0
Known limitations
Unsupported features : not planned
- user-defined datatypes are not supported
- this includes compound and variable-length types
Unsupported features : planned for future release
- groups (not yet fully supported ?)
- file output chunking control
Known problems
As-of v0.1.1
- in conversion from iris cubes with
from_iris
, use of anunlimited_dims
key currently causes an exception - in conversion to xarray with
to_xarray
, dataset encodings are not reproduced, most notably the "unlimited_dims" control is missing
References
- Iris issue : https://github.com/SciTools/iris/issues/4994
- planning presentation : https://github.com/SciTools/iris/files/10499677/Xarray-Iris.bridge.proposal.--.NcData.pdf
- in-Iris code workings : https://github.com/pp-mo/iris/pull/75
Developer Notes
Documentation build
- For a full docs-build, a simple
make html
will do for now.- The
docs/Makefile
wipes the API docs and invokes sphinx-apidoc for a full rebuild - Results are then available at
docs/_build/html/index.html
- The
- The above is just for local testing if required : We have automatic builds for releases and PRs via ReadTheDocs
Release actions
- Cut a release on GitHub : this triggers a new docs version on ReadTheDocs
- Build the distribution
- if needed, get build
- run
python -m build
- Push to PyPI
- if needed, get twine
- run
python -m twine upload --repository testpypi dist/*
- this uploads to TestPyPI
- create a new env with test dependencies
conda create -n ncdtmp python=3.11 iris xarray filelock requests pytest pip
(N.B. 'filelock' and 'requests' are test dependencies of iris) - install the new package with
pip install --index-url https://test.pypi.org/simple/ ncdata
and run tests - if that checks OK, remove
--repository testpypi
and repeat #2- --> uploads to "real" PyPI
- repeat #4, removing the
--index-url
, to check thatpip install ncdata
now finds the new version
- Update conda to source the new version from PyPI
- create a PR on the ncdata feedstock
- update :
- version number
- SHA
- Note : the PyPI reference will normally look after itself
- Also : make any required changes to dependencies -- normally no change required
- get PR merged ; wait a few hours ; check the new version appears in
conda search ncdata
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ncdata-0.1.1.tar.gz
.
File metadata
- Download URL: ncdata-0.1.1.tar.gz
- Upload date:
- Size: 436.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90e0a609c7053205e24b90520ce06568239435ca58b0c7afa1b4129d73ca60ec |
|
MD5 | 949e3e9988c1820bb704bdbc175c9ac3 |
|
BLAKE2b-256 | 7808854ec35a5d1fd5d6c9c67d33a2228be1dfa535fd9dd4f46cc2a9114dc47e |
File details
Details for the file ncdata-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: ncdata-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3baf54b6e0f67ee630c8f17823b47a62b1cc2e580b6d5e434b4c87c06f16411 |
|
MD5 | 7413aabd57daaefcc7314b83379c8c0d |
|
BLAKE2b-256 | 4ce6a9805b972e9ffed0b1ef995fac9fda661cf843842bc3a331ebc0ecbb4545 |