HDF5 file loader using h5py — tree extraction, node metadata, and dataset loading for the vcti-fileloader framework

Project description

FileLoader HDF5

HDF5 file loader using h5py — tree extraction, node metadata, and dataset loading for the vcti-fileloader framework.

When to Use This Loader

Use vcti-fileloader-hdf5 when you need to inspect the structure of an HDF5 file — groups, datasets, attributes — without reading every dataset array into memory upfront. The separated loading design lets you:

Browse the tree hierarchy first, then fetch only the datasets you need.
Retrieve node metadata (names, types, byte sizes) for display or filtering before committing to a full data load.
Load attributes selectively by node ID instead of scanning the whole file.

If you only need raw array access without tree/metadata introspection, use h5py directly.

Installation

pip install vcti-fileloader-hdf5>=1.0.0

Quick Start

from pathlib import Path
from vcti.fileloader_hdf5 import H5pyLoader, get_loader_descriptor
from vcti.fileloader import LoaderRegistry

# Context manager (recommended)
loader = H5pyLoader()
with loader.open(Path("data.h5")) as handle:
    tree = loader.load_tree(handle)
    info = loader.load_node_info(handle)
    node = loader.load_dataset(handle, node_id=2)

# Manual load/unload
loader = H5pyLoader()
handle = loader.load(Path("data.h5"))
try:
    tree = loader.load_tree(handle)
finally:
    loader.unload(handle)

# Registry-based usage
registry = LoaderRegistry()
registry.register(get_loader_descriptor())
desc = registry.get("hdf5-h5py-loader")
with desc.loader.open(Path("data.h5")) as handle:
    tree = desc.loader.load_tree(handle)

Example Output

`load_tree()` — structured array

Each row represents a node in the HDF5 hierarchy. Pointers use node IDs (0 = no link).

id  parent_id  first_child_id  prev_sibling_id  next_sibling_id
 1          0               2                0                0   ← / (root)
 2          1               4                0                3   ← results/
 3          1               0                2                0   ← ids
 4          2               0                0                0   ← results/stress

`load_node_info()` — structured array

id  name               type       size
 1  /                   group         0
 2  results             group         0
 3  ids                 dataset      24   ← 3 × int64 = 24 bytes
 4  results/stress      dataset      24   ← 3 × float64 = 24 bytes

`load_dataset()` — DataNode

node = loader.load_dataset(handle, node_id=4)
node.data          # np.array([1.0, 2.0, 3.0])
node.attributes    # {'units': 'MPa', 'type': 'dataset', 'shape': (3,), 'dtype': 'float64'}

API

H5pyLoader

Method	Description
`load(path, **options)`	Open HDF5 file, return h5py.File handle
`open(path, **options)`	Context manager — loads and auto-unloads
`unload(data)`	Close HDF5 file and clear cached mappings
`can_load(path)`	Check extension (.h5, .hdf5, .he5)
`load_tree(data)`	Tree structure as structured array
`load_node_info(data)`	Node metadata (id, name, type, size)
`load_attributes(data, node_ids)`	Attributes dict per node
`load_dataset(data, node_id)`	DataNode with array + attributes

Helpers

	Description
`get_loader_descriptor()`	Create LoaderDescriptor for registry
`H5pyValidator`	Check h5py availability
`H5pySetup`	No-op setup (h5py needs no config)

Error Handling

The loader raises specific exceptions for different failure modes:

from vcti.fileloader import LoadError, UnloadError, UnsupportedFormatError

loader = H5pyLoader()
try:
    with loader.open(Path("data.h5")) as handle:
        node = loader.load_dataset(handle, node_id=99)
except FileNotFoundError:
    # File does not exist at the given path
    ...
except UnsupportedFormatError:
    # File exists but is not a valid HDF5 file
    ...
except LoadError:
    # Other failure during file open (e.g., permissions)
    ...
except KeyError:
    # Node ID not found in load_dataset
    ...
except ValueError:
    # File handle is closed
    ...

Performance

Node map caching

On the first call to any load method, the loader walks the HDF5 hierarchy once via h5py.File.visit() to build a bidirectional path-to-ID / ID-to-path mapping. This mapping is cached per file handle (via WeakKeyDictionary) and reused by all subsequent calls — load_tree, load_node_info, load_attributes, load_dataset — so you never pay for a second traversal.

Memory overhead

The node map stores two Python dicts (path string and integer ID per node). Rough overhead: ~200-300 bytes per node. For a file with 100,000 nodes, expect ~20-30 MB for the mapping alone. The structured arrays returned by load_tree and load_node_info add ~20 bytes and ~300 bytes per node respectively.

Traversal time

h5py.File.visit() is backed by HDF5's C-level H5Literate, so traversal is fast — typically < 1 second for 100K nodes on local SSD. The bottleneck for large files is usually dataset I/O, not tree walking.

Filtered vs. full attribute loading

load_attributes(handle) — reads attributes for every node. Use this when you need a complete picture (e.g., building a search index).
load_attributes(handle, node_ids=np.array([2, 5])) — reads only the specified nodes. Prefer this when you know which nodes you need, as it avoids touching unrelated HDF5 objects.

Full array loading

load_dataset() reads the entire dataset into memory via obj[:]. For very large datasets (multi-GB), consider using h5py slicing directly on the file handle instead.

Thread Safety

h5py file handles are not thread-safe. Do not share a single h5py.File handle across threads. Instead, open a separate handle per thread or serialize access with a lock.

Dependencies

h5py (>=3.0)
numpy (>=1.24)
vcti-fileloader (>=1.0.0)
vcti-array-tree (>=1.0.0) — DataNode

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader_hdf5-1.0.0.tar.gz (14.3 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vcti_fileloader_hdf5-1.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file vcti_fileloader_hdf5-1.0.0.tar.gz.

File metadata

Download URL: vcti_fileloader_hdf5-1.0.0.tar.gz
Upload date: Mar 29, 2026
Size: 14.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcti_fileloader_hdf5-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`885b4935b9e8a03959a7defc9d79d37bd3965620c8e335f09073716202c4bf1a`
MD5	`a7dac71692a662b002bf0c3e468352e7`
BLAKE2b-256	`b35b5626c404cfbb52f7c7e7e881d6b9d42121dd8d4706caba114304118472b2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-1.0.0.tar.gz:

Publisher: publish.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vcti_fileloader_hdf5-1.0.0.tar.gz
- Subject digest: 885b4935b9e8a03959a7defc9d79d37bd3965620c8e335f09073716202c4bf1a
- Sigstore transparency entry: 1193196272
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: vcollab/vcti-python-fileloader-hdf5@efeca56e7f962c1e6175bc74b7ba8e4b328bc73d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/vcollab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@efeca56e7f962c1e6175bc74b7ba8e4b328bc73d
- Trigger Event: workflow_dispatch

File details

Details for the file vcti_fileloader_hdf5-1.0.0-py3-none-any.whl.

File metadata

Download URL: vcti_fileloader_hdf5-1.0.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcti_fileloader_hdf5-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e2e1c8e5525cb4392cb394d5f4566625a764539ae35e9ced1da36810f3ecd8b`
MD5	`dd6e2ea29e7cdfee6b82d76ec2893044`
BLAKE2b-256	`371dc7fd2e74f2add08a8a0c4e951aee18c58af59dd428072b263d7c50863568`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_hdf5-1.0.0-py3-none-any.whl:

Publisher: publish.yml on vcollab/vcti-python-fileloader-hdf5

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vcti_fileloader_hdf5-1.0.0-py3-none-any.whl
- Subject digest: 1e2e1c8e5525cb4392cb394d5f4566625a764539ae35e9ced1da36810f3ecd8b
- Sigstore transparency entry: 1193196333
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: vcollab/vcti-python-fileloader-hdf5@efeca56e7f962c1e6175bc74b7ba8e4b328bc73d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/vcollab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@efeca56e7f962c1e6175bc74b7ba8e4b328bc73d
- Trigger Event: workflow_dispatch

vcti-fileloader-hdf5 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

FileLoader HDF5

When to Use This Loader

Installation

Quick Start

Example Output

load_tree() — structured array

load_node_info() — structured array

load_dataset() — DataNode

API

H5pyLoader

Helpers

Error Handling

Performance

Node map caching

Memory overhead

Traversal time

Filtered vs. full attribute loading

Full array loading

Thread Safety

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`load_tree()` — structured array

`load_node_info()` — structured array

`load_dataset()` — DataNode