Skip to main content

h5py-backed HDF5 file loader for the vcti-fileloader framework

Project description

FileLoader HDF5

HDF5 file loader using h5py — attaches an HDF5 file's content as a locked subtree under a caller-supplied parent in any LockableTree.

Overview

vcti-fileloader-hdf5 is the HDF5 plugin for the vcti-fileloader framework. It implements the Loader protocol with one operation that does the real work: populate(handle, tree, parent) walks the HDF5 hierarchy once via h5py.File.visit() and grafts the file's groups and datasets as a subtree under parent.

The loader pass-through is strict: every node's file_attributes is populated verbatim from the corresponding HDF5 object's obj.attrs — no synthesised keys, no derived storage metadata, no file-path. Application-domain attributes (file paths, derived storage info, category tags) belong in a before_lock hook or a downstream enricher.

Groups become DataNode payloads carrying their HDF5 attributes; datasets become LazyDataNode payloads with shape / dtype populated at construction so consumers can browse without materialising. The returned subtree is structure-locked and payload-locked. The loader is backing-agnostic — it writes against the LockableTree protocol from vcti-tree, so the caller picks the tree implementation (DictTree, vcti-nptree.ArrayTree, or their own).

Installation

pip install vcti-fileloader-hdf5>=5.1.1

Quick Start

from pathlib import Path

from vcti.fileloader.core import DataNode, LoaderRegistry, materialise_subtree
from vcti.fileloader.hdf5 import H5pyLoader, get_loader_descriptor
from vcti.tree import DictTree                # or any other LockableTree backing

# Context manager (recommended for the one-shot case)
loader = H5pyLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.h5")) as handle:
    # Eager: read every dataset into memory before the file closes.
    subtree_root = loader.populate(handle, tree, tree.root_handle, lazy=False)
# Tree is fully usable here; handle is closed.

# Lazy loading: keep the handle open while you browse and materialise on demand.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
    subtree_root = loader.populate(handle, tree, tree.root_handle)  # lazy=True default
    # ... browse the tree, call payload.load() on datasets you need ...
finally:
    loader.unload(handle)

# Materialise then unload — close the file but keep the tree usable.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
    subtree_root = loader.populate(handle, tree, tree.root_handle)
    materialise_subtree(tree, subtree_root)
finally:
    loader.unload(handle)

# Registry-based usage
registry = LoaderRegistry()
registry.register(get_loader_descriptor())
desc = registry.get("hdf5-h5py-loader")
tree = DictTree(DataNode())
with desc.loader.open(Path("data.h5")) as handle:
    subtree_root = desc.loader.populate(handle, tree, tree.root_handle, lazy=False)

Quick Start — with a before_lock hook

Stamping file_path and arbitrary domain tags is the caller's job, run through the before_lock hook (which fires after the subtree is built but before it is locked):

def stamp_file_path(tree, root):
    tree.payload(root).enricher_attributes["file_path"] = str(path)

with loader.open(path) as handle:
    root = loader.populate(handle, tree, tree.root_handle, before_lock=stamp_file_path)

For rule-driven enrichment, pair the hook with vcti-attribute-enricher:

from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.tree import descendants
from vcti.lookup import Rule

def enrich(tree, root):
    apply_rules(
        descendants(tree, root, include_self=True),
        rules=[
            EnrichRule(set={"file_path": str(path)}),
            EnrichRule(set={"category": "mechanical"},
                       when=(Rule("name", "^=", "stress"),)),
        ],
    )

with loader.open(path) as handle:
    root = loader.populate(handle, tree, tree.root_handle, before_lock=enrich)

vcti-attribute-enricher is an optional sibling package — this loader has no dependency on it.


What the subtree looks like

Given an HDF5 file with this structure:

/  (file_attr="test_value")
├── results/  (solver="NASTRAN")
│   └── stress  (units="MPa"), shape=(3,), dtype=float64
└── ids  shape=(3,), dtype=int64

populate(handle, tree, tree.root_handle) produces this subtree:

Node name Payload type data file_attributes shape / dtype
subtree root None DataNode None {file_attr: "test_value"}
results (group) "results" DataNode None {solver: "NASTRAN"}
stress (lazy) "stress" LazyDataNode (lazy) {units: "MPa"} (3,) / float64
ids (lazy) "ids" LazyDataNode (lazy) {} (3,) / int64

Note that name is a first-class field, and shape / dtype are first-class fields on LazyDataNode — none of these are in file_attributes. The enricher side (enricher_attributes) starts empty; the merged attributes ChainMap reflects only the file's native keys until a before_lock hook (or other enricher) adds to the enricher side.


API

H5pyLoader

Method Description
load(path, **options) Open HDF5 file, return h5py.File handle
open(path, **options) Context manager — loads and auto-unloads
populate(handle, tree, parent, *, before_lock=None, lazy=True, **options) Attach subtree, run hook, lock, return subtree root handle
unload(handle) Close HDF5 file handle (idempotent)
can_load(path) Check extension (.h5, .hdf5)

Helpers

Description
get_loader_descriptor() Create LoaderDescriptor for registry
H5pyValidator Check h5py availability
H5pySetup No-op setup (h5py needs no config)

Lazy vs Eager Loading

populate(..., lazy=True) (default) attaches each dataset as a LazyDataNode — its data is None until .load() is called, at which point the closure reads handle[path][:]. Use lazy when:

  • you want to browse the file's structure (names, shapes, dtypes, attributes) before deciding which arrays to materialise, or
  • the file is too large to load entirely into memory.

populate(..., lazy=False) reads every dataset into a DataNode at populate time. Use eager when:

  • the file is small and you want everything loaded immediately, or
  • you cannot guarantee the handle will stay open after populate returns and you do not want to use materialise_subtree.

Handle lifetime contract with lazy nodes

Each LazyDataNode produced by populate(..., lazy=True) holds a closure over handle. Once loader.unload(handle) runs, those closures cannot fulfil further .load() calls. Three patterns avoid the problem:

  1. Keep the handle open for the lifetime of the tree.
  2. Materialise then unload: populate(handle, tree, p); materialise_subtree(tree, root); unload(handle). After this, every lazy node is loaded, and the tree is fully usable without the handle.
  3. Use eager mode: populate(..., lazy=False).

materialise_subtree(tree, root_handle) is exported from vcti.fileloader.core and walks the subtree, calling .load() on every LazyDataNode.


Error Handling

from vcti.fileloader.core import (
    LoadError,
    UnloadError,
    UnsupportedFormatError,
    TreeAttachmentError,
)

try:
    with loader.open(Path("data.h5")) as handle:
        subtree_root = loader.populate(handle, tree, tree.root_handle)
except FileNotFoundError:
    ...
except UnsupportedFormatError:
    ...
except LoadError:
    ...
except TreeAttachmentError:
    # parent is missing, deleted, or structure-locked in `tree`
    ...
except ValueError:
    # populate() was called on a closed handle
    ...

If populate fails partway through (an I/O error during the walk, or an exception in before_lock), the partial subtree is removed before the exception propagates — callers never see a half-built subtree.


Soft Links and Hard Links

HDF5 files can contain soft links (symbolic references to other paths) and hard links (multiple names pointing to the same object). h5py.File.visit() — which this loader uses — follows hard links but does not follow soft links or external links by default:

  • Hard-linked objects appear once in the subtree (at the first path visit() encounters). They are not duplicated.
  • Soft links are silently skipped and will not appear as nodes in the subtree.
  • External links are also skipped.

This behaviour is inherited from h5py/libhdf5 and is not configurable in this loader.


Thread Safety

h5py file handles are not thread-safe. Do not share a single h5py.File handle across threads. Open a separate handle per thread, or serialise access with a lock.

Tree backings (DictTree, ArrayTree, etc.) are likewise not thread-safe. Calling populate on the same tree from multiple threads concurrently is undefined behaviour.


Dependencies

  • h5py (>=3.0)
  • numpy (>=1.24)
  • vcti-fileloader (>=5.1.0) — Loader protocol, SubtreeBuilder, DataNode, LazyDataNode, materialise_subtree (import from vcti.fileloader.core)
  • vcti-tree (>=1.0.0) — LockableTree protocol

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader_hdf5-5.1.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_fileloader_hdf5-5.1.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file vcti_fileloader_hdf5-5.1.1.tar.gz.

File metadata

  • Download URL: vcti_fileloader_hdf5-5.1.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_fileloader_hdf5-5.1.1.tar.gz
Algorithm Hash digest
SHA256 dc11d7a93df584509b12642e9e61e770a985f2f826007ccf641ab1826e9957d1
MD5 8905f9705924529e5195ffddb6312498
BLAKE2b-256 209e8b3d97e7b431d2f56dd22c438b08f8c9fdcf5121ee6bc055a10aca33bb80

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-5.1.1.tar.gz:

Publisher: release.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_fileloader_hdf5-5.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_fileloader_hdf5-5.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6fdf68de166b30452aaa4d1e280e7d05614a8bc323199e4c6dd1b80028375f46
MD5 26c7269f17955a75f239751aa23a2a37
BLAKE2b-256 cf476cd762ffaa54006217166b0f6c8eb37a5e3fa3e2a11170a51c322ba99492

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-5.1.1-py3-none-any.whl:

Publisher: release.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page