Skip to main content

Abstract NetCDF data objects, providing fast data transfer between analysis packages.

Project description

ncdata

Generic NetCDF data in Python.

Provides fast data exchange between analysis packages, and full control of storage formatting.

Especially : Ncdata exchanges data between Xarray and Iris as efficently as possible

"lossless, copy-free and lazy-preserving".

This enables the user to freely mix+match operations from both projects, getting the "best of both worlds".

import xarray
import ncdata.iris_xarray as nci
import iris.quickplot as qplt

ds = xarray.open_dataset(filepath)
ds_resample = ds.rolling(time=3).mean()
cubes = nci.cubes_from_xarray(ds_resample)
temp_cube = cubes.extract_cube("air_temperature")
qplt.contourf(temp_cube[0])

Contents

Motivation

Primary Use

Fast and efficient translation of data between Xarray and Iris objects.

This allows the user to mix+match features from either package in code.

For example:

from ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray

# Apply Iris regridder to xarray data
dataset = xarray.open_dataset("file1.nc", chunks="auto")
(cube,) = cubes_from_xarray(dataset)
cube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)
dataset2 = cubes_to_xarray(cube2)

# Apply Xarray statistic to Iris data
cubes = iris.load("file1.nc")
dataset = cubes_to_xarray(cubes)
dataset2 = dataset.group_by("time.dayofyear").argmin()
cubes2 = cubes_from_xarray(dataset2)
  • data conversion is equivalent to writing to a file with one library, and reading it back with the other ..
    • .. except that no actual files are written
  • both real (numpy) and lazy (dask) variable data arrays are transferred directly, without copying or computing

Secondary Uses

Exact control of file formatting

Ncdata can also be used as a transfer layer between Iris or Xarray file i/o and the exact format of data stored in files.
I.E. adjustments can be made to file data before loading it into Iris/Xarray; or Iris/Xarray saved output can be adjusted before writing to a file.

This allows the user to workaround any package limitations in controlling storage aspects such as : data chunking; reserved attributes; missing-value processing; or dimension control.

For example:

from ncdata.xarray import from_xarray
from ncdata.iris import to_iris
from ncdata.netcdf4 import to_nc4, from_nc4

# Rename a dimension in xarray output
dataset = xr.open_dataset("file1.nc")
xr_ncdata = from_xarray(dataset)
dim = xr_ncdata.dimensions.pop("dim0")
dim.name = "newdim"
xr_ncdata.dimensions["newdim"] = dim
for var in xr_ncdata.variables.values():
    var.dimensions = ["newdim" if dim == "dim0" else dim for dim in var.dimensions]
to_nc4(ncdata, "file_2a.nc")

# Fix chunking in Iris input
ncdata = from_nc4("file1.nc")
for var in ncdata.variables:
    # custom chunking() mimics the file chunks we want
    var.chunking = lambda: (100.0e6 if dim == "dim0" else -1 for dim in var.dimensions)
cubes = to_iris(ncdata)

Manipulation of data

ncdata can also be used for data extraction and modification, similar to the scope of CDO and NCO command-line operators but without file operations.
However, this type of usage is as yet still undeveloped : There is no inbuilt support for data consistency checking, or obviously useful operations such as indexing by dimension. This could be added in future, but it is also true that many such operations (like indexing) may be better done using Iris/Xarray.

Principles

  • ncdata represents NetCDF data as Python objects
  • ncdata objects can be freely manipulated, independent of any data file
  • ncdata variables can contain either real (numpy) or lazy (Dask) arrays
  • ncdata can be losslessly converted to and from actual NetCDF files
  • Iris or Xarray objects can be converted to and from ncdata, in the same way that they are read from and saved to NetCDF files
  • translation between Xarray and Iris is based on conversion to ncdata, which is in turn equivalent to file i/o
    • thus, Iris/Xarray translation is equivalent to saving from one package into a file, then loading the file in the other package
  • ncdata exchanges variable data directly with Iris/Xarray, with no copying of real data or computing of lazy data
  • ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing transfer of arrays larger than memory

Code Examples

API documentation

Installation

Install from conda-forge with conda

conda install -c conda-forge ncdata

Or from PyPI with pip

pip install ncdata

Project Status

Code Stability

We intend to follow PEP 440 or (older) SemVer versioning principles.

Release version is at "v0.1".
This is a first complete implementation, with functional operational of all public APIs.

The code is however still experimental, and APIs are not stable (hence no major version yet).

Iris and Xarray Compatibility

  • C.I. tests GitHub PRs and merges, against latest releases of Iris and Xarray
  • compatible with iris >= v3.7.0

Known limitations

Unsupported features : not planned

  • user-defined datatypes are not supported
    • this includes compound and variable-length types

Unsupported features : planned for future release

  • groups (not yet fully supported ?)
  • file output chunking control

Known problems

As-of v0.1

References

Developer Notes

Documentation build

  • For a full docs-build, a simple make html will do for now.
    • The docs/Makefile wipes the API docs and invokes sphinx-apidoc for a full rebuild
    • Results are then available at docs/_build/html/index.html
  • The above is just for local testing if required : We have automatic builds for releases and PRs via ReadTheDocs

Release actions

  1. Cut a release on GitHub : this triggers a new docs version on ReadTheDocs
  2. Build the distribution
    1. if needed, get build
    2. run python -m build
  3. Push to PyPI
    1. if needed, get twine
    2. run python -m twine --repository testpypi upload dist/*
      • this uploads to TestPyPI
    3. if that checks OK, remove --repository testpypi and repeat
      • --> uploads to "real" PyPI
    4. check that pip install ncdata can now find the new version
  4. Update conda to source the new version from PyPI
    1. create a PR on the ncdata feedstock
    2. update :
    3. get PR merged ; wait a few hours ; check the new version appears in conda search ncdata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncdata-0.1.0.tar.gz (430.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncdata-0.1.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file ncdata-0.1.0.tar.gz.

File metadata

  • Download URL: ncdata-0.1.0.tar.gz
  • Upload date:
  • Size: 430.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for ncdata-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8d75986c1a463153f3f23df7e7f71a9e7a91a060f15c5009f42b95f2f0391575
MD5 fdaa5371800d5e08022ebf57ccad7a90
BLAKE2b-256 ee3bef289053fd18c983455df2d337a00624e9e83ac03583f49e98ef255b58dd

See more details on using hashes here.

File details

Details for the file ncdata-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ncdata-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for ncdata-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20a2d367cddfe3990d8fee234c58d75f880391661bfd1086d0f66c1105d5d066
MD5 b551e80206e1c3be880d05eb2b312aca
BLAKE2b-256 83234272d77946131a7f26b3f60e1e1e84b47b6623f1024a31bb5f39632d3cb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page