Skip to main content

Concurrent HDF5 and NetCDF4 reader (experimental)

Project description

Crates.io PyPI Documentation Build (rust) Build (python) codecov Rust nightly

HIDEFIX

This Rust and Python library provides an alternative reader for the HDF5 file or NetCDF4 file (which uses HDF5) which supports concurrent access to data. This is achieved by building an index of the chunks, allowing a thread to use many file handles to read the file. The original (native) HDF5 library is used to build the index, but once it has been created it is no longer needed. The index can be serialized to disk so that performing the indexing is not necessary.

In Rust:

use hidefix::prelude::*;

let idx = Index::index("tests/data/coads_climatology.nc4").unwrap();
let mut r = idx.reader("SST").unwrap();

let values = r.values::<f32>(None, None).unwrap();

println!("SST: {:?}", values);

or with Python using Xarray:

import xarray as xr
import hidefix

ds = xr.open_dataset('file.nc', engine='hidefix')
print(ds)

See the example for how to use hidefix for regular, parallel or concurrent reads.

Motivation

The HDF5 library requires internal locks to be thread-safe since it relies on internal buffers which cannot be safely accessed/written to from multiple threads. This effectively causes multi-threaded applications to use sequential reads, while competing for the locks. And also apparently cause each other trouble, perhaps through dropping cached chunks which other threads still need. It can be safely used from different processes, but that requires potentially much more overhead than multi-threaded or asynchronous code.

Some basic benchmarks

hidefix is intended to perform better when concurrent reads are made either to the same dataset, same file or to different files from a single process. For basic benchmarks the performance is on-par or slightly better compared to doing standard sequential reads than the native HDF5 library (through its rust-bindings). Where hidefix shines is once the multiple threads in the same process tries to read in any way from a HDF5 file simultaneously.

This simple benchmark tries to read a small dataset sequentially or concurrently using the cached reader from hidefix and the native reader from HDF5. The dataset is chunked, shuffled and compressed (using gzip):

$ cargo bench --bench concurrency -- --ignored

test shuffled_compressed::cache_concurrent_reads  ... bench:  15,903,406 ns/iter (+/- 220,824)
test shuffled_compressed::cache_sequential        ... bench:  59,778,761 ns/iter (+/- 602,316)
test shuffled_compressed::native_concurrent_reads ... bench: 411,605,868 ns/iter (+/- 35,346,233)
test shuffled_compressed::native_sequential       ... bench: 103,457,237 ns/iter (+/- 7,703,936)

Inspiration and other projects

This work is based in part on the DMR++ module of the OPeNDAP Hyrax server. The zarr format does something similar, and the same approach has been tested out on HDF5 as swell.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hidefix-0.9.0.tar.gz (9.2 MB view details)

Uploaded Source

Built Distributions

hidefix-0.9.0-cp38-abi3-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8+Windows x86-64

hidefix-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

hidefix-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file hidefix-0.9.0.tar.gz.

File metadata

  • Download URL: hidefix-0.9.0.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for hidefix-0.9.0.tar.gz
Algorithm Hash digest
SHA256 228cf0748279546047eda25acb90b4fdaa344ad4d37b36e34c74d5558498172a
MD5 d9ea55e7a57e3de68a369bd6671f8d13
BLAKE2b-256 c04d5aedd8e9d1e0165fd8d218348292ccf26ea9bced30a9ff03a2a8954e1432

See more details on using hashes here.

File details

Details for the file hidefix-0.9.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: hidefix-0.9.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for hidefix-0.9.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6c269fcf8548db8c703aac74743eb38c866356f2cf265c529f582ad9c6ba608e
MD5 b131838a7acd4918950a6efba874b244
BLAKE2b-256 9e8a0b5d9ff3e6100d36a362e901293540e894d3e16bc1ab64e7a1d2f574b6fe

See more details on using hashes here.

File details

Details for the file hidefix-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hidefix-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c70db38c8474ead0c75cc21756bec9bd3376e8404997b6151377f871fb7a7bd1
MD5 fb50a2df4b27e9f34906c062a4cef86a
BLAKE2b-256 fa545c539db93f5f305e5e9dbfce29d72093d932b1bf742dba746f567a4e8581

See more details on using hashes here.

File details

Details for the file hidefix-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hidefix-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3940e0f0f5ed5956b30d3791bc349344a9c70cdf689a74d147e1b2a234d68bb8
MD5 50ccb936552edc9006cecea368e7fb95
BLAKE2b-256 504bfea42c0928a95f6fae10234ab50a19a0189426cfa6f9cdbca1a68900379a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page