Skip to main content

HDF5 file loader (vcti.fileloader.hdf5) for the vcti-fileloader framework: attaches a locked subtree (with file-native attributes, optional lazy datasets, and a before_lock hook) under a caller-supplied parent in any LockableTree backing

Project description

FileLoader HDF5

HDF5 file loader using h5py — attaches an HDF5 file's content as a locked subtree under a caller-supplied parent in any LockableTree.

Overview

vcti-fileloader-hdf5 is the HDF5 plugin for the vcti-fileloader framework. It implements the Loader protocol with one operation that does the real work: populate(handle, tree, parent) walks the HDF5 hierarchy once via h5py.File.visit() and grafts the file's groups and datasets as a subtree under parent.

The loader pass-through is strict: every node's file_attributes is populated verbatim from the corresponding HDF5 object's obj.attrs — no synthesised keys, no derived storage metadata, no file-path. Application-domain attributes (file paths, derived storage info, category tags) belong in a before_lock hook or a downstream enricher.

Groups become DataNode payloads carrying their HDF5 attributes; datasets become LazyDataNode payloads with shape / dtype populated at construction so consumers can browse without materialising. The returned subtree is structure-locked and payload-locked. The loader is backing-agnostic — it writes against the LockableTree protocol from vcti-tree, so the caller picks the tree implementation (DictTree, vcti-nptree.ArrayTree, or their own).

Installation

pip install vcti-fileloader-hdf5>=5.1.0

Quick Start

from pathlib import Path

from vcti.fileloader.core import DataNode, LoaderRegistry, materialise_subtree
from vcti.fileloader.hdf5 import H5pyLoader, get_loader_descriptor
from vcti.tree import DictTree                # or any other LockableTree backing

# Context manager (recommended for the one-shot case)
loader = H5pyLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.h5")) as handle:
    # Eager: read every dataset into memory before the file closes.
    subtree_root = loader.populate(handle, tree, tree.root_handle, lazy=False)
# Tree is fully usable here; handle is closed.

# Lazy loading: keep the handle open while you browse and materialise on demand.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
    subtree_root = loader.populate(handle, tree, tree.root_handle)  # lazy=True default
    # ... browse the tree, call payload.load() on datasets you need ...
finally:
    loader.unload(handle)

# Materialise then unload — close the file but keep the tree usable.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
    subtree_root = loader.populate(handle, tree, tree.root_handle)
    materialise_subtree(tree, subtree_root)
finally:
    loader.unload(handle)

# Registry-based usage
registry = LoaderRegistry()
registry.register(get_loader_descriptor())
desc = registry.get("hdf5-h5py-loader")
tree = DictTree(DataNode())
with desc.loader.open(Path("data.h5")) as handle:
    subtree_root = desc.loader.populate(handle, tree, tree.root_handle, lazy=False)

Quick Start — with a before_lock hook

Stamping file_path and arbitrary domain tags is the caller's job, run through the before_lock hook (which fires after the subtree is built but before it is locked):

def stamp_file_path(tree, root):
    tree.payload(root).enricher_attributes["file_path"] = str(path)

with loader.open(path) as handle:
    root = loader.populate(handle, tree, tree.root_handle, before_lock=stamp_file_path)

For rule-driven enrichment, pair the hook with vcti-attribute-enricher:

from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.tree import descendants
from vcti.lookup import Rule

def enrich(tree, root):
    apply_rules(
        descendants(tree, root, include_self=True),
        rules=[
            EnrichRule(set={"file_path": str(path)}),
            EnrichRule(set={"category": "mechanical"},
                       when=(Rule("name", "^=", "stress"),)),
        ],
    )

with loader.open(path) as handle:
    root = loader.populate(handle, tree, tree.root_handle, before_lock=enrich)

vcti-attribute-enricher is an optional sibling package — this loader has no dependency on it.


What the subtree looks like

Given an HDF5 file with this structure:

/  (file_attr="test_value")
├── results/  (solver="NASTRAN")
│   └── stress  (units="MPa"), shape=(3,), dtype=float64
└── ids  shape=(3,), dtype=int64

populate(handle, tree, tree.root_handle) produces this subtree:

Node name Payload type data file_attributes shape / dtype
subtree root None DataNode None {file_attr: "test_value"}
results (group) "results" DataNode None {solver: "NASTRAN"}
stress (lazy) "stress" LazyDataNode (lazy) {units: "MPa"} (3,) / float64
ids (lazy) "ids" LazyDataNode (lazy) {} (3,) / int64

Note that name is a first-class field, and shape / dtype are first-class fields on LazyDataNode — none of these are in file_attributes. The enricher side (enricher_attributes) starts empty; the merged attributes ChainMap reflects only the file's native keys until a before_lock hook (or other enricher) adds to the enricher side.


API

H5pyLoader

Method Description
load(path, **options) Open HDF5 file, return h5py.File handle
open(path, **options) Context manager — loads and auto-unloads
populate(handle, tree, parent, *, before_lock=None, lazy=True, **options) Attach subtree, run hook, lock, return subtree root handle
unload(handle) Close HDF5 file handle (idempotent)
can_load(path) Check extension (.h5, .hdf5)

Helpers

Description
get_loader_descriptor() Create LoaderDescriptor for registry
H5pyValidator Check h5py availability
H5pySetup No-op setup (h5py needs no config)

Lazy vs Eager Loading

populate(..., lazy=True) (default) attaches each dataset as a LazyDataNode — its data is None until .load() is called, at which point the closure reads handle[path][:]. Use lazy when:

  • you want to browse the file's structure (names, shapes, dtypes, attributes) before deciding which arrays to materialise, or
  • the file is too large to load entirely into memory.

populate(..., lazy=False) reads every dataset into a DataNode at populate time. Use eager when:

  • the file is small and you want everything loaded immediately, or
  • you cannot guarantee the handle will stay open after populate returns and you do not want to use materialise_subtree.

Handle lifetime contract with lazy nodes

Each LazyDataNode produced by populate(..., lazy=True) holds a closure over handle. Once loader.unload(handle) runs, those closures cannot fulfil further .load() calls. Three patterns avoid the problem:

  1. Keep the handle open for the lifetime of the tree.
  2. Materialise then unload: populate(handle, tree, p); materialise_subtree(tree, root); unload(handle). After this, every lazy node is loaded, and the tree is fully usable without the handle.
  3. Use eager mode: populate(..., lazy=False).

materialise_subtree(tree, root_handle) is exported from vcti.fileloader.core and walks the subtree, calling .load() on every LazyDataNode.


Error Handling

from vcti.fileloader.core import (
    LoadError,
    UnloadError,
    UnsupportedFormatError,
    TreeAttachmentError,
)

try:
    with loader.open(Path("data.h5")) as handle:
        subtree_root = loader.populate(handle, tree, tree.root_handle)
except FileNotFoundError:
    ...
except UnsupportedFormatError:
    ...
except LoadError:
    ...
except TreeAttachmentError:
    # parent is missing, deleted, or structure-locked in `tree`
    ...
except ValueError:
    # populate() was called on a closed handle
    ...

If populate fails partway through (an I/O error during the walk, or an exception in before_lock), the partial subtree is removed before the exception propagates — callers never see a half-built subtree.


Soft Links and Hard Links

HDF5 files can contain soft links (symbolic references to other paths) and hard links (multiple names pointing to the same object). h5py.File.visit() — which this loader uses — follows hard links but does not follow soft links or external links by default:

  • Hard-linked objects appear once in the subtree (at the first path visit() encounters). They are not duplicated.
  • Soft links are silently skipped and will not appear as nodes in the subtree.
  • External links are also skipped.

This behaviour is inherited from h5py/libhdf5 and is not configurable in this loader.


Thread Safety

h5py file handles are not thread-safe. Do not share a single h5py.File handle across threads. Open a separate handle per thread, or serialise access with a lock.

Tree backings (DictTree, ArrayTree, etc.) are likewise not thread-safe. Calling populate on the same tree from multiple threads concurrently is undefined behaviour.


Dependencies

  • h5py (>=3.0)
  • numpy (>=1.24)
  • vcti-fileloader (>=5.1.0) — Loader protocol, SubtreeBuilder, DataNode, LazyDataNode, materialise_subtree (import from vcti.fileloader.core)
  • vcti-tree (>=1.0.0) — LockableTree protocol

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader_hdf5-5.1.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_fileloader_hdf5-5.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file vcti_fileloader_hdf5-5.1.0.tar.gz.

File metadata

  • Download URL: vcti_fileloader_hdf5-5.1.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_fileloader_hdf5-5.1.0.tar.gz
Algorithm Hash digest
SHA256 34d764fb5b8363ab4c1763373ba65e4dfc3a915d6490f5b1f2c2fae97ea0bae1
MD5 82e06598d76818e15a4d60336c11a5b3
BLAKE2b-256 4f33cb6442dad540c3b885e0ad69ea7490e1256cd67f7649665b8657550f197e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-5.1.0.tar.gz:

Publisher: release.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_fileloader_hdf5-5.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_fileloader_hdf5-5.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5772be5ec40ff5c5d236ba34d2c314f4cf00223ac1f8cac4916cd10471c87d77
MD5 11bb3479d766916f6d8298704a97689a
BLAKE2b-256 7aacb35c0f6aeebc2b017fff951e42962ea0c657a9219ef68a857cb6cc245cb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-5.1.0-py3-none-any.whl:

Publisher: release.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page