Skip to main content

Concurrent HDF5 and NetCDF4 reader (experimental)

Project description

Crates.io PyPI Documentation Build (rust) Build (python) codecov Rust nightly

HIDEFIX

This Rust and Python library provides an alternative reader for the HDF5 file or NetCDF4 file (which uses HDF5) which supports concurrent access to data. This is achieved by building an index of the chunks, allowing a thread to use many file handles to read the file. The original (native) HDF5 library is used to build the index, but once it has been created it is no longer needed. The index can be serialized to disk so that performing the indexing is not necessary.

In Rust:

use hidefix::prelude::*;

let idx = Index::index("tests/data/coads_climatology.nc4").unwrap();
let mut r = idx.reader("SST").unwrap();

let values = r.values::<f32>(None, None).unwrap();

println!("SST: {:?}", values);

or with Python using Xarray:

import xarray as xr
import hidefix

ds = xr.open_dataset('file.nc', engine='hidefix')
print(ds)

See the example for how to use hidefix for regular, parallel or concurrent reads.

Motivation

The HDF5 library requires internal locks to be thread-safe since it relies on internal buffers which cannot be safely accessed/written to from multiple threads. This effectively causes multi-threaded applications to use sequential reads, while competing for the locks. And also apparently cause each other trouble, perhaps through dropping cached chunks which other threads still need. It can be safely used from different processes, but that requires potentially much more overhead than multi-threaded or asynchronous code.

Some basic benchmarks

hidefix is intended to perform better when concurrent reads are made either to the same dataset, same file or to different files from a single process. For basic benchmarks the performance is on-par or slightly better compared to doing standard sequential reads than the native HDF5 library (through its rust-bindings). Where hidefix shines is once the multiple threads in the same process tries to read in any way from a HDF5 file simultaneously.

This simple benchmark tries to read a small dataset sequentially or concurrently using the cached reader from hidefix and the native reader from HDF5. The dataset is chunked, shuffled and compressed (using gzip):

$ cargo bench --bench concurrency -- --ignored

test shuffled_compressed::cache_concurrent_reads  ... bench:  15,903,406 ns/iter (+/- 220,824)
test shuffled_compressed::cache_sequential        ... bench:  59,778,761 ns/iter (+/- 602,316)
test shuffled_compressed::native_concurrent_reads ... bench: 411,605,868 ns/iter (+/- 35,346,233)
test shuffled_compressed::native_sequential       ... bench: 103,457,237 ns/iter (+/- 7,703,936)

Inspiration and other projects

This work is based in part on the DMR++ module of the OPeNDAP Hyrax server. The zarr format does something similar, and the same approach has been tested out on HDF5 as swell.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hidefix-0.10.0.tar.gz (9.2 MB view details)

Uploaded Source

Built Distributions

hidefix-0.10.0-cp39-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.9+ Windows x86-64

hidefix-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ x86-64

hidefix-0.10.0-cp39-abi3-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.9+ macOS 11.0+ ARM64

File details

Details for the file hidefix-0.10.0.tar.gz.

File metadata

  • Download URL: hidefix-0.10.0.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for hidefix-0.10.0.tar.gz
Algorithm Hash digest
SHA256 3e5dcff7a86425e41d5e54d3caeffd70402055eab723d9663c486409db3db97c
MD5 40473652779a73a91b64d6ff8d6f523d
BLAKE2b-256 d28c834ff5f2d5b9c662199d86729333588538bd192a0f041c61356cad8f2bc9

See more details on using hashes here.

File details

Details for the file hidefix-0.10.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: hidefix-0.10.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for hidefix-0.10.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 55e34d5f56dadfb6a0c69c749d7db9f953628cabba5baa06d66d98e45762595b
MD5 75bd53846b4c1027d57ef8dabaefcae6
BLAKE2b-256 7a3cd3bb754a2d9f8adfbcaeb7dc7e6674d50e03acf4cb7e8b7361f0370c6793

See more details on using hashes here.

File details

Details for the file hidefix-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hidefix-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 531cb5f4bee1029dd8cd844366da57ee18e7af1603f1ba8e5386c20b1230cab4
MD5 328a5a5f641b0ca4c5159f5d23c4bb27
BLAKE2b-256 e5e5508b9cb9f9855a0b42504158f8e34c31aa3dddc8a64aecf72a4bd3bf680e

See more details on using hashes here.

File details

Details for the file hidefix-0.10.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hidefix-0.10.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3b840daee95d2c1fd0c5797970640bedf4f45cc0d4f259e6b204758859a4bd76
MD5 aed994bc5573d2f12b87c9ffe32e1347
BLAKE2b-256 8cb8a0b9780f5d2d060dd6fc6038bc65aea704140bb6c15f7ef5d5fbf28dd8f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page