h5py-backed HDF5 file loader for the vcti-fileloader framework
Project description
FileLoader HDF5
HDF5 file loader using h5py — attaches an HDF5 file's content as a
locked subtree under a caller-supplied parent in any LockableTree.
Overview
vcti-fileloader-hdf5 is the HDF5 plugin for the vcti-fileloader
framework. It implements the Loader protocol with one operation
that does the real work: populate(handle, tree, parent) walks the
HDF5 hierarchy once via h5py.File.visit() and grafts the file's
groups and datasets as a subtree under parent.
The loader pass-through is strict: every node's file_attributes
is populated verbatim from the corresponding HDF5 object's
obj.attrs — no synthesised keys, no derived storage metadata, no
file-path. Application-domain attributes (file paths, derived
storage info, category tags) belong in a before_lock hook or a
downstream enricher.
Groups become DataNode payloads carrying their HDF5 attributes;
datasets become LazyDataNode payloads with shape / dtype
populated at construction so consumers can browse without
materialising. The returned subtree is structure-locked and
payload-locked. The loader is backing-agnostic — it writes against
the LockableTree protocol from vcti-tree, so the caller picks
the tree implementation (DictTree, vcti-nptree.ArrayTree, or
their own).
Installation
pip install vcti-fileloader-hdf5>=5.1.1
Quick Start
from pathlib import Path
from vcti.fileloader.core import DataNode, LoaderRegistry, materialise_subtree
from vcti.fileloader.hdf5 import H5pyLoader, get_loader_descriptor
from vcti.tree import DictTree # or any other LockableTree backing
# Context manager (recommended for the one-shot case)
loader = H5pyLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.h5")) as handle:
# Eager: read every dataset into memory before the file closes.
subtree_root = loader.populate(handle, tree, tree.root_handle, lazy=False)
# Tree is fully usable here; handle is closed.
# Lazy loading: keep the handle open while you browse and materialise on demand.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
subtree_root = loader.populate(handle, tree, tree.root_handle) # lazy=True default
# ... browse the tree, call payload.load() on datasets you need ...
finally:
loader.unload(handle)
# Materialise then unload — close the file but keep the tree usable.
loader = H5pyLoader()
tree = DictTree(DataNode())
handle = loader.load(Path("data.h5"))
try:
subtree_root = loader.populate(handle, tree, tree.root_handle)
materialise_subtree(tree, subtree_root)
finally:
loader.unload(handle)
# Registry-based usage
registry = LoaderRegistry()
registry.register(get_loader_descriptor())
desc = registry.get("hdf5-h5py-loader")
tree = DictTree(DataNode())
with desc.loader.open(Path("data.h5")) as handle:
subtree_root = desc.loader.populate(handle, tree, tree.root_handle, lazy=False)
Quick Start — with a before_lock hook
Stamping file_path and arbitrary domain tags is the caller's
job, run through the before_lock hook (which fires after the
subtree is built but before it is locked):
def stamp_file_path(tree, root):
tree.payload(root).enricher_attributes["file_path"] = str(path)
with loader.open(path) as handle:
root = loader.populate(handle, tree, tree.root_handle, before_lock=stamp_file_path)
For rule-driven enrichment, pair the hook with
vcti-attribute-enricher:
from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.tree import descendants
from vcti.lookup import Rule
def enrich(tree, root):
apply_rules(
descendants(tree, root, include_self=True),
rules=[
EnrichRule(set={"file_path": str(path)}),
EnrichRule(set={"category": "mechanical"},
when=(Rule("name", "^=", "stress"),)),
],
)
with loader.open(path) as handle:
root = loader.populate(handle, tree, tree.root_handle, before_lock=enrich)
vcti-attribute-enricher is an optional sibling package — this
loader has no dependency on it.
What the subtree looks like
Given an HDF5 file with this structure:
/ (file_attr="test_value")
├── results/ (solver="NASTRAN")
│ └── stress (units="MPa"), shape=(3,), dtype=float64
└── ids shape=(3,), dtype=int64
populate(handle, tree, tree.root_handle) produces this subtree:
| Node | name | Payload type | data | file_attributes | shape / dtype |
|---|---|---|---|---|---|
| subtree root | None |
DataNode |
None |
{file_attr: "test_value"} |
— |
| results (group) | "results" |
DataNode |
None |
{solver: "NASTRAN"} |
— |
| stress (lazy) | "stress" |
LazyDataNode |
(lazy) | {units: "MPa"} |
(3,) / float64 |
| ids (lazy) | "ids" |
LazyDataNode |
(lazy) | {} |
(3,) / int64 |
Note that name is a first-class field, and shape / dtype are
first-class fields on LazyDataNode — none of these are in
file_attributes. The enricher side (enricher_attributes) starts
empty; the merged attributes ChainMap reflects only the file's
native keys until a before_lock hook (or other enricher) adds to
the enricher side.
API
H5pyLoader
| Method | Description |
|---|---|
load(path, **options) |
Open HDF5 file, return h5py.File handle |
open(path, **options) |
Context manager — loads and auto-unloads |
populate(handle, tree, parent, *, before_lock=None, lazy=True, **options) |
Attach subtree, run hook, lock, return subtree root handle |
unload(handle) |
Close HDF5 file handle (idempotent) |
can_load(path) |
Check extension (.h5, .hdf5) |
Helpers
| Description | |
|---|---|
get_loader_descriptor() |
Create LoaderDescriptor for registry |
H5pyValidator |
Check h5py availability |
H5pySetup |
No-op setup (h5py needs no config) |
Lazy vs Eager Loading
populate(..., lazy=True) (default) attaches each dataset as a
LazyDataNode — its data is None until .load() is called, at
which point the closure reads handle[path][:]. Use lazy when:
- you want to browse the file's structure (names, shapes, dtypes, attributes) before deciding which arrays to materialise, or
- the file is too large to load entirely into memory.
populate(..., lazy=False) reads every dataset into a DataNode at
populate time. Use eager when:
- the file is small and you want everything loaded immediately, or
- you cannot guarantee the handle will stay open after
populatereturns and you do not want to usematerialise_subtree.
Handle lifetime contract with lazy nodes
Each LazyDataNode produced by populate(..., lazy=True) holds a
closure over handle. Once loader.unload(handle) runs, those
closures cannot fulfil further .load() calls. Three patterns avoid
the problem:
- Keep the handle open for the lifetime of the tree.
- Materialise then unload:
populate(handle, tree, p); materialise_subtree(tree, root); unload(handle). After this, every lazy node is loaded, and the tree is fully usable without the handle. - Use eager mode:
populate(..., lazy=False).
materialise_subtree(tree, root_handle) is exported from
vcti.fileloader.core and walks the subtree, calling .load() on
every LazyDataNode.
Error Handling
from vcti.fileloader.core import (
LoadError,
UnloadError,
UnsupportedFormatError,
TreeAttachmentError,
)
try:
with loader.open(Path("data.h5")) as handle:
subtree_root = loader.populate(handle, tree, tree.root_handle)
except FileNotFoundError:
...
except UnsupportedFormatError:
...
except LoadError:
...
except TreeAttachmentError:
# parent is missing, deleted, or structure-locked in `tree`
...
except ValueError:
# populate() was called on a closed handle
...
If populate fails partway through (an I/O error during the walk,
or an exception in before_lock), the partial subtree is removed
before the exception propagates — callers never see a half-built
subtree.
Soft Links and Hard Links
HDF5 files can contain soft links (symbolic references to other
paths) and hard links (multiple names pointing to the same
object). h5py.File.visit() — which this loader uses — follows
hard links but does not follow soft links or external links by
default:
- Hard-linked objects appear once in the subtree (at the first
path
visit()encounters). They are not duplicated. - Soft links are silently skipped and will not appear as nodes in the subtree.
- External links are also skipped.
This behaviour is inherited from h5py/libhdf5 and is not configurable in this loader.
Thread Safety
h5py file handles are not thread-safe. Do not share a single
h5py.File handle across threads. Open a separate handle per
thread, or serialise access with a lock.
Tree backings (DictTree, ArrayTree, etc.) are likewise not
thread-safe. Calling populate on the same tree from multiple
threads concurrently is undefined behaviour.
Dependencies
- h5py (>=3.0)
- numpy (>=1.24)
- vcti-fileloader (>=5.1.0) —
Loaderprotocol,SubtreeBuilder,DataNode,LazyDataNode,materialise_subtree(import fromvcti.fileloader.core) - vcti-tree (>=1.0.0) —
LockableTreeprotocol
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcti_fileloader_hdf5-5.1.1.tar.gz.
File metadata
- Download URL: vcti_fileloader_hdf5-5.1.1.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc11d7a93df584509b12642e9e61e770a985f2f826007ccf641ab1826e9957d1
|
|
| MD5 |
8905f9705924529e5195ffddb6312498
|
|
| BLAKE2b-256 |
209e8b3d97e7b431d2f56dd22c438b08f8c9fdcf5121ee6bc055a10aca33bb80
|
Provenance
The following attestation bundles were made for vcti_fileloader_hdf5-5.1.1.tar.gz:
Publisher:
release.yml on vcollab/vcti-python-fileloader-hdf5
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader_hdf5-5.1.1.tar.gz -
Subject digest:
dc11d7a93df584509b12642e9e61e770a985f2f826007ccf641ab1826e9957d1 - Sigstore transparency entry: 1746547410
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader-hdf5@e28bef13a9a7a0fcd46d4f80c567dceba2d5dd1c -
Branch / Tag:
refs/tags/v5.1.1 - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e28bef13a9a7a0fcd46d4f80c567dceba2d5dd1c -
Trigger Event:
push
-
Statement type:
File details
Details for the file vcti_fileloader_hdf5-5.1.1-py3-none-any.whl.
File metadata
- Download URL: vcti_fileloader_hdf5-5.1.1-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fdf68de166b30452aaa4d1e280e7d05614a8bc323199e4c6dd1b80028375f46
|
|
| MD5 |
26c7269f17955a75f239751aa23a2a37
|
|
| BLAKE2b-256 |
cf476cd762ffaa54006217166b0f6c8eb37a5e3fa3e2a11170a51c322ba99492
|
Provenance
The following attestation bundles were made for vcti_fileloader_hdf5-5.1.1-py3-none-any.whl:
Publisher:
release.yml on vcollab/vcti-python-fileloader-hdf5
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader_hdf5-5.1.1-py3-none-any.whl -
Subject digest:
6fdf68de166b30452aaa4d1e280e7d05614a8bc323199e4c6dd1b80028375f46 - Sigstore transparency entry: 1746547575
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader-hdf5@e28bef13a9a7a0fcd46d4f80c567dceba2d5dd1c -
Branch / Tag:
refs/tags/v5.1.1 - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e28bef13a9a7a0fcd46d4f80c567dceba2d5dd1c -
Trigger Event:
push
-
Statement type: