Skip to main content

netCDF4 via h5py

Project description

https://travis-ci.org/shoyer/h5netcdf.svg?branch=master https://badge.fury.io/py/h5netcdf.svg

A Python interface for the netCDF4 file-format that reads and writes HDF5 files API directly via h5py, without relying on the Unidata netCDF library.

Why h5netcdf?

  • We’ve seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For one workflow, h5netcdf was reported to be almost 4x faster than netCDF4-python.

  • It has one less massive binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python.

  • Anecdotally, HDF5 users seem to be unexcited about switching to netCDF – hopefully this will convince them that the netCDF4 is actually quite sane!

  • Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs.

Install

Ensure you have a recent version of h5py installed (I recommend using conda). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: pip install h5netcdf

Usage

h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the noteable exceptions of:

  • support for operations the rename or delete existing objects.

  • suport for creating unlimited dimensions.

We simply haven’t gotten around to implementing these features yet. Patches would be very welcome.

New API

The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example:

import h5netcdf
import numpy as np

with h5netcdf.File('mydata.nc', 'w') as f:
    # set dimensions with a dictionary
    f.dimensions = {'x': 5}
    # and update them with a dict-like interface
    # f.dimensions['x'] = 5
    # f.dimensions.update({'x': 5})

    v = f.create_variable('hello', ('x',), float)
    v[:] = np.ones(5)

    # you don't need to create groups first
    # you also don't need to create dimensions first if you supply data
    # with the new variable
    v = f.create_variable('/grouped/data', ('y',), data=np.arange(10))

    # access and modify attributes with a dict-like interface
    v.attrs['foo'] = 'bar'

    # you can access variables and groups directly using a hierarchical
    # keys like h5py
    print(f['/grouped/data'])

Legacy API

The legacy API is designed for compatibility with netCDF4-python. To use it, import h5netcdf.legacyapi:

import h5netcdf.legacyapi as netCDF4
# everything here would also work with this instead:
# import netCDF4
import numpy as np

with netCDF4.Dataset('mydata.nc', 'w') as ds:
    ds.createDimension('x', 5)
    v = ds.createVariable('hello', float, ('x',))
    v[:] = np.ones(5)

    g = ds.createGroup('grouped')
    g.createDimension('y', 10)
    g.createVariable('data', 'i8', ('y',))
    v = g['data']
    v[:] = np.arange(10)
    v.foo = 'bar'
    print(ds.groups['grouped'].variables['data'])

The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don’t include:

  • Utility functions chartostring, num2date, etc., that are not directly necessary for writing netCDF files.

  • We don’t support the endian argument to createVariable. The h5py API does not appear to offer this feature.

  • h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the _FillValue attribute). We prefer to leave this functionality to client libraries (e.g., xarray), which can implement their exact desired scaling behavior.

Change Log

Version 0.3.1:

  • Fix garbage collection issue.

  • Add missing .flush() method for groups.

  • Allow creating dimensions of size 0.

Version 0.3.0:

  • Datasets are now loaded lazily. This should increase performance when opening files with a large number of groups and/or variables.

  • Support for writing arrays of variable length unicode strings with dtype=str via the legacy API.

  • h5netcdf now writes the _NCProperties attribute for identifying netCDF4 files.

License

3-clause BSD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5netcdf-0.3.1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distributions

h5netcdf-0.3.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

h5netcdf-0.3.1-py2.py3-none-any.whl (15.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file h5netcdf-0.3.1.tar.gz.

File metadata

  • Download URL: h5netcdf-0.3.1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for h5netcdf-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9e10ce8c6ad65c8714aeb49553b9f817761ad5b4996450d79964f9813849a455
MD5 ae0878cc9442f3608adaaee304db7f13
BLAKE2b-256 3ad516a234cb6d1b80e7c015343f6c0ed545880122c54e36dabc0051212a8c41

See more details on using hashes here.

File details

Details for the file h5netcdf-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for h5netcdf-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 60e5acec9ba5fbde75df430ac5fe9fcc5fa6554ca5e40efbe17ca92982b72130
MD5 470acac84951a5fd13f8b8a96c815e91
BLAKE2b-256 43e8e353270937fa15491b9857dfeb986b40cbc870d28dc8786250d3e1c88aa4

See more details on using hashes here.

File details

Details for the file h5netcdf-0.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for h5netcdf-0.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e3e85898b05de72c602cc7506afcab8aeb1fe3b2bb877ff5275cc37dbd27f72f
MD5 722b223d8c3980f17115e288765b7e71
BLAKE2b-256 ff110f6ad2312584f3c2c3fb995a30600354812bb1da57b579914dd1ec6bb96f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page