Skip to main content

File loader framework (vcti.fileloader.core): loaders attach a locked subtree (with layered file/enricher attributes) into a shared LockableTree under a caller-supplied parent. Plugin loaders live under the vcti.fileloader.* namespace (hdf5, json, numpy, ...).

Project description

vcti-fileloader

A protocol-based framework for loading file content into a shared tree. Loaders attach a locked subtree under a caller-supplied parent handle in any LockableTree backing — they do not own the tree.

Install with pip install vcti-fileloader; import from vcti.fileloader.core. The vcti.fileloader package is a namespace shared with the loader plugin packages (vcti.fileloader.hdf5, vcti.fileloader.json, vcti.fileloader.numpy, …), each of which is its own PyPI distribution.

This package is fully typed (py.typed) and safe for strict type checkers (mypy --strict, pyright).

Overview

Applications that work with simulation and CAE data need to load many file formats — HDF5, VTK, OpenFOAM, JSON, CSV, proprietary binary, etc. The data from each file is hierarchical (groups, datasets, attributes), but the trees often have to combine: a workflow may load several files into a single browseable structure.

vcti-fileloader defines a uniform protocol that every loader plugin implements: populate(handle, tree, parent) builds the file's content as a new subtree under parent and locks it before returning. Loaders pass through the file's native attributes verbatim into a read-only side of the node payload (file_attributes); a separate mutable side (enricher_attributes) is reserved for post-load enrichment by the caller. The framework codes against the LockableTree protocol from vcti-tree, so the caller picks the backing — DictTree for simple cases, ArrayTree (from vcti-nptree) for file-structure-scale workloads, or any other conforming implementation.

┌─────────────────────────────────────────────────────┐
│                  Application Code                   │
│        (owns LockableTree, uses Loader protocol)    │
└──────────────┬──────────────────────┬───────────────┘
               │                      │
       ┌───────▼───────┐      ┌───────▼───────┐
       │  HDF5 Loader  │      │  JSON Loader  │  ...
       │  (plugin pkg) │      │  (plugin pkg) │
       └───────────────┘      └───────────────┘
                  attach subtrees into the tree

Key Concepts

Loader (Protocol)

Any class that implements these four methods plus validator and setup attributes satisfies the protocol — no base class inheritance required (PEP 544 structural subtyping):

Method Purpose
can_load(path) Lightweight check — can this loader handle the file?
load(path, **options) Open a file and return an opaque handle
populate(handle, tree, parent, *, before_lock=None, **options) Build the file's subtree under parent; lock and return its handle
unload(handle) Release file handles and memory (idempotent)

populate is generic in the tree handle type H, so the same loader works against any LockableTree[DataNode, H] backing.

Each loader also carries optional validator and setup hooks:

  • LoaderValidator.validate() — returns True if all runtime dependencies (e.g., h5py) are available.
  • LoaderSetup.setup() — configures paths, environment variables, or component versions before first use.

before_lock hook

populate accepts an optional callback (tree, subtree_root) -> None that fires inside the transaction after the loader attaches the file's content and before the locks are applied. Use it for attribute enrichment, computing derived payload state, or validation. Any exception raised by the hook triggers rollback of the partial subtree, so failures are atomic.

The hook does not know about, and is not coupled to, any particular enrichment library — it is just a callable. The package vcti-attribute-enricher provides one (rule-driven) enricher you can wire in via this hook; your own callbacks work equally well.

SubtreeBuilder

A transactional helper for implementing populate. It owns a single subtree under a caller-supplied parent and guarantees:

  • Scope enforcement. Writes are rejected if their parent is not inside this builder's subtree — loaders cannot accidentally mutate the rest of the tree.
  • Pre-commit hook. A before_commit callable (the implementation side of the loader's before_lock) runs after content is built and before locks fire.
  • Commit-on-success. Normal exit from the with block locks the subtree (structure + payload).
  • Rollback-on-failure. An exception during the build or during the pre-commit hook removes the partial subtree before propagating.

DataNode and LazyDataNode

Tree payloads. A DataNode carries four pieces of state:

Field / property Description
data Primary payload — NumPy array, parsed dict, None, anything.
name File-internal identifier (HDF5 basename, NPZ archive key). None when not applicable.
file_attributes Read-only Mapping view of the file's native attributes (loader-set, verbatim).
enricher_attributes Mutable dict where post-load enrichers (or before_lock hooks) write.
attributes ChainMap merged view, enricher first. Read here for portable rules; writes go to enricher_attributes.

A LazyDataNode adds an on-demand loader callback plus pre-load shape and dtype fields, so consumers can filter or display a dataset without materialising it.

LoaderDescriptor and LoaderRegistry

LoaderDescriptor wraps a Loader instance with metadata — a unique id, a human-readable name, and filterable attributes (typically {"supported_formats": ["hdf5-file"]} pointing at descriptor IDs from vcti-path-format-descriptors).

LoaderRegistry is a typed registry of LoaderDescriptor entries. Register loaders at startup, then look them up by id or query by attributes at runtime.

Lifecycle Contracts

  1. Validate / setup — call validator.validate() and setup.setup() once before the first load().
  2. Checkcan_load(path) before load() to prevent UnsupportedFormatError.
  3. Loadloader.load(path) opens the file, returns a handle.
  4. Populateloader.populate(handle, tree, parent, before_lock=...) grafts the file's subtree under parent, optionally runs the hook, then locks the subtree. Returns the subtree root handle.
  5. Unloadloader.unload(handle) releases resources. Idempotent. If the loader attached LazyDataNodes, their closures may hold the handle — call materialise_subtree(tree, root) first if the tree must remain usable after unload.

Installation

pip install vcti-fileloader>=5.1.0
dependencies = [
    "vcti-fileloader>=5.1.0",
]

Quick Start

from pathlib import Path

from vcti.tree import DictTree, descendants
from vcti.fileloader.core import DataNode, LoaderDescriptor, LoaderRegistry

# At startup
registry = LoaderRegistry()
registry.register(LoaderDescriptor(
    id="hdf5-h5py-loader",
    name="HDF5 Loader (h5py)",
    loader=my_h5py_loader,
    attributes={"supported_formats": ["hdf5-file"]},
))

# At runtime
desc = registry.get("hdf5-h5py-loader")
desc.loader.validator.validate()
desc.loader.setup.setup()

# Application owns the tree
tree: DictTree[DataNode] = DictTree(DataNode())

handle = desc.loader.load(Path("simulation.h5"))
try:
    subtree_root = desc.loader.populate(handle, tree, tree.root_handle)
    # subtree is structure-locked and payload-locked
    for h in descendants(tree, subtree_root):
        node = tree.payload(h)
        if node.name == "stress":
            ...
finally:
    desc.loader.unload(handle)

Quick Start — with a before_lock hook

from vcti.attribute_enricher import EnrichRule, apply_rules
from vcti.lookup import Rule

def enrich(tree, root):
    apply_rules(
        descendants(tree, root, include_self=True),
        rules=[
            EnrichRule(set={"file_path": str(path)}),
            EnrichRule(set={"category": "mechanical"},
                       when=(Rule("name", "^=", "stress"),)),
        ],
    )

handle = desc.loader.load(Path("simulation.h5"))
try:
    root = desc.loader.populate(handle, tree, tree.root_handle, before_lock=enrich)
finally:
    desc.loader.unload(handle)

vcti-attribute-enricher is an optional package — the framework itself has no dependency on it. The before_lock argument accepts any callable (tree, root) -> None; your own callback works just as well.

Error Handling

All exceptions inherit from LoaderError:

Exception When to raise / catch
LoaderError Base — catches any loader failure
LoadError File cannot be opened or parsed
UnloadError Resource cleanup failed
UnsupportedFormatError Loader does not recognise the file format
ValidationError validator.validate() detected missing dependencies
SetupError setup.setup() could not configure the environment
TreeAttachmentError populate() cannot attach: parent is missing, deleted, or structure-locked

TreeAttachmentError translates the named tree exceptions from vcti-tree (HandleError, InactiveNodeError, StructureLockedError) into a single fileloader-domain failure type. The underlying tree exception is preserved on __cause__.

What this package does NOT do

  • No concrete loaders. Actual file reading (HDF5, JSON, NPY, etc.) lives in separate loader plugin packages.
  • No tree implementation. Backings come from vcti-tree (DictTree), vcti-nptree (ArrayTree), or third parties.
  • No attribute enrichment. Enrichment is run via the optional before_lock hook by the caller, using whatever callable they like (e.g., vcti-attribute-enricher).
  • No data transformation. Data is returned as-is from the loader.
  • No caching. Caching strategies belong at the application level.

Further Reading

  • Common Patterns — Loader implementation, the SubtreeBuilder, the before_lock hook, validator/setup patterns, error handling.
  • Design & Concepts — Architecture, protocol rationale, layered attribute model, locking model.

Dependencies

  • numpy (>=1.24) — DataNode.__eq__
  • vcti-plugin-catalog (>=1.0.0) — Descriptor and Registry base classes
  • vcti-tree (>=1.0.0) — LockableTree protocol, generic algorithms, named exceptions

Versioning

This package follows Semantic Versioning. Breaking changes to the Loader protocol or DataNode shape will only occur in major version bumps. Downstream loader plugins should pin to a compatible major version (e.g., vcti-fileloader>=5.0,<6).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader-5.1.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_fileloader-5.1.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file vcti_fileloader-5.1.0.tar.gz.

File metadata

  • Download URL: vcti_fileloader-5.1.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_fileloader-5.1.0.tar.gz
Algorithm Hash digest
SHA256 a16d29ca17a20dff96cb2398f8fbe7419ca9b4b7206c6f40ad9b9678c443818e
MD5 cf5b0900fdc5a2f7bc0f19756f652fe3
BLAKE2b-256 c909aef7bcea74814a8fa8df78d046b6f14d3ed0bef2c97e2810e0f98eed3a67

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader-5.1.0.tar.gz:

Publisher: release.yml on vcollab/vcti-python-fileloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_fileloader-5.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_fileloader-5.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91d6a8f13c0e7d03284aedfd719d9cc3660dac5d60871d481b7865ae15af4dc8
MD5 259e318bb675ad4d8ff35d4683c606e7
BLAKE2b-256 03ee671b136c05200d3449fe3ab44c74f21c25edc5e844ba681a47a76156efef

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader-5.1.0-py3-none-any.whl:

Publisher: release.yml on vcollab/vcti-python-fileloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page