File loader framework with pluggable descriptors, validators, and a registry for format-specific data loading
Project description
vcti-fileloader
A protocol-based framework for loading hierarchical scientific and engineering data from files. It defines a standard interface that all file-format loaders must implement, and a registry for discovering and managing them at runtime.
This package is fully typed (py.typed) and safe for strict type checkers
(mypy --strict, pyright).
Why this package exists
Applications that work with simulation and CAE data need to load many file formats — HDF5, VTK, OpenFOAM, proprietary binary, etc. Each format has its own library, its own API, and its own way of representing a tree of nodes, metadata, and heavy data arrays.
vcti-fileloader solves this by defining a single, uniform protocol that every loader plugin implements. Application code programs against the protocol, not the format. Adding support for a new file format means writing a new loader plugin — no changes to application code.
┌─────────────────────────────────────────────────────┐
│ Application Code │
│ (uses Loader protocol + Registry) │
└──────────────┬──────────────────────┬───────────────┘
│ │
┌───────▼───────┐ ┌───────▼───────┐
│ HDF5 Loader │ │ VTK Loader │ ... (one per format)
│ (plugin pkg) │ │ (plugin pkg) │
└───────────────┘ └───────────────┘
Key Concepts
Loader (Protocol)
The central interface. Any class that implements these methods satisfies the protocol — no base class inheritance required (PEP 544 structural subtyping):
| Method | Purpose |
|---|---|
load(path, **options) |
Open a file and return an opaque data handle |
unload(data) |
Release file handles and memory (idempotent) |
can_load(path) |
Lightweight check — can this loader handle the file? |
load_tree(data) |
Extract the node hierarchy as a NumPy structured array |
load_node_info(data) |
Extract lightweight node metadata (name, type) |
load_attributes(data, node_ids) |
Extract key-value attributes per node |
load_dataset(data, node_id) |
Extract a heavy data array as a DataNode |
Each loader also carries optional validator and setup hooks:
LoaderValidator.validate()— returnsTrueif all runtime dependencies (e.g., h5py, vtk) are available.LoaderSetup.setup()— configures paths, environment variables, or component versions before first use.
LoaderDescriptor
Wraps a Loader instance with registry metadata — a unique id, a
human-readable name, and filterable attributes (e.g., supported_formats).
LoaderRegistry
A typed registry of LoaderDescriptor entries. Register loaders at startup,
then look them up by id or query by attributes at runtime.
Key behaviours (inherited from vcti-plugin-catalog):
register()raisesDuplicateEntryErrorif the id already exists.get()raisesEntryNotFoundErrorif the id is not found.find()returnsNoneinstead of raising for missing ids.lookupproperty provides attribute-based filtering viavcti-lookup.
NodeID
Type alias (int) for node identifiers used across the protocol, exported
from the package for use in type annotations.
DataNode
load_dataset() returns a DataNode from vcti-array-tree. A DataNode has:
.data— the NumPy array containing the heavy data..attributes— adict[str, Any]of metadata for the node.
See the vcti-array-tree documentation for full details.
Data Flow
Register ──► Discover ──► Validate & Setup ──► Load ──► Query ──► Unload
- Register — Each loader plugin registers a
LoaderDescriptorwith the sharedLoaderRegistry. - Discover — Application code looks up a loader by id or filters by
attributes (e.g., find all loaders that support
"hdf5-file"). - Validate & Setup — Call
validator.validate()andsetup.setup()to ensure the runtime environment is ready. - Load —
loader.load(path)opens the file and returns an opaque handle. - Query — Use
load_tree,load_node_info,load_attributes, andload_datasetto extract structure and data from the handle. - Unload —
loader.unload(handle)releases resources. This is idempotent — calling it twice on the same handle must not raise.
Lifecycle Contracts
- Call
can_load(path)beforeload()to preventUnsupportedFormatError. - Call
validator.validate()andsetup.setup()before the firstload(). unload()is idempotent — safe to call multiple times on the same handle.- After
unload(), calling anyload_*method on that handle is undefined. load()may be called multiple times with different paths; each returns an independent handle.
Installation
pip install vcti-fileloader>=1.0.0
In pyproject.toml dependencies
dependencies = [
"vcti-fileloader>=1.0.0",
]
Quick Start
from pathlib import Path
from vcti.fileloader import LoaderDescriptor, LoaderRegistry
# At startup: register available loaders
registry = LoaderRegistry()
registry.register(LoaderDescriptor(
id="hdf5-h5py-loader",
name="HDF5 Loader (h5py)",
loader=my_h5py_loader, # implements Loader protocol
attributes={"supported_formats": ["hdf5-file"]},
))
# At runtime: discover, validate, load
desc = registry.get("hdf5-h5py-loader")
desc.loader.validator.validate() # check dependencies
desc.loader.setup.setup() # configure environment
handle = desc.loader.load(Path("simulation.h5"))
tree = desc.loader.load_tree(handle) # node hierarchy
info = desc.loader.load_node_info(handle) # node names and types
attrs = desc.loader.load_attributes(handle) # per-node attributes
node = desc.loader.load_dataset(handle, node_id=1) # heavy data array
desc.loader.unload(handle) # release resources
Error Handling
All exceptions inherit from LoaderError, so callers can catch broadly
or handle specific failure modes:
| Exception | When to raise / catch |
|---|---|
LoaderError |
Base — catches any loader failure |
LoadError |
File cannot be opened or parsed (I/O errors, corrupt files) |
UnloadError |
Resource cleanup failed |
UnsupportedFormatError |
Loader does not recognise the file format. Prefer can_load() first |
ValidationError |
validator.validate() detected missing dependencies |
SetupError |
setup.setup() could not configure the environment |
Distinguishing LoadError vs UnsupportedFormatError: Use
UnsupportedFormatError when the loader does not recognise the format at
all (wrong extension, unknown magic bytes). Use LoadError when the
format is recognised but the content cannot be read (truncated file,
incompatible version, permission error).
Error handling example
from vcti.fileloader import (
LoaderError,
LoadError,
UnsupportedFormatError,
ValidationError,
SetupError,
)
# Validate before loading
if not desc.loader.validator.validate():
raise ValidationError("Missing h5py — install with: pip install h5py")
if not desc.loader.setup.setup():
raise SetupError("Could not configure HDF5 library paths")
# Load with error handling
path = Path("data.h5")
if not desc.loader.can_load(path):
print(f"Loader {desc.id} cannot handle {path}")
else:
try:
handle = desc.loader.load(path)
tree = desc.loader.load_tree(handle)
except LoadError as e:
print(f"Failed to read file: {e}")
except LoaderError as e:
print(f"Unexpected loader error: {e}")
finally:
desc.loader.unload(handle) # safe even if load() failed partially
What this package does NOT do
- No concrete loaders — This is the interface only. Actual file reading (HDF5, VTK, etc.) lives in separate loader plugin packages.
- No data transformation — Data is returned as-is from the loader.
- No caching — Caching strategies belong at the application level.
Further Reading
- Common Patterns — Validator/setup implementation, multi-loader registration, error handling, and naming conventions.
- Design & Concepts — Architecture, protocol rationale, and package boundaries.
Dependencies
- numpy (>=1.24)
- vcti-plugin-catalog (>=1.0.0) — Descriptor and Registry base classes
- vcti-array-tree (>=1.0.0) —
DataNodereturned byload_dataset
Versioning
This package follows Semantic Versioning. Breaking
changes to the Loader protocol (adding required methods, changing
signatures) will only occur in major version bumps. Downstream loader
plugins should pin to a compatible major version (e.g., vcti-fileloader>=1.0,<2).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcti_fileloader-1.0.0.tar.gz.
File metadata
- Download URL: vcti_fileloader-1.0.0.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82c1e34f1b144174125f80913e410359819d77d45b01609db0a41fb6b9b2f642
|
|
| MD5 |
359348d7154d1c2f77ce1d1ae16e765e
|
|
| BLAKE2b-256 |
f930b02196ff9a1f8e968be6ee90ff79981c9f05f6db54c4670311098a9f96f6
|
Provenance
The following attestation bundles were made for vcti_fileloader-1.0.0.tar.gz:
Publisher:
publish.yml on vcollab/vcti-python-fileloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader-1.0.0.tar.gz -
Subject digest:
82c1e34f1b144174125f80913e410359819d77d45b01609db0a41fb6b9b2f642 - Sigstore transparency entry: 1193120988
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader@5b36580c95f8be33e34a378034787526b290af97 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b36580c95f8be33e34a378034787526b290af97 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file vcti_fileloader-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vcti_fileloader-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bf42bbf6265efebf49806fdbbcca45e6494bbef260ce7f3cb5de73c8dd59315
|
|
| MD5 |
c0987557b01a9fed04c648d80b889ad8
|
|
| BLAKE2b-256 |
07c974765e64e70101352f938e96e5f03c7863ec9b5e58a3b3afe71e43da269e
|
Provenance
The following attestation bundles were made for vcti_fileloader-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on vcollab/vcti-python-fileloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader-1.0.0-py3-none-any.whl -
Subject digest:
5bf42bbf6265efebf49806fdbbcca45e6494bbef260ce7f3cb5de73c8dd59315 - Sigstore transparency entry: 1193121034
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader@5b36580c95f8be33e34a378034787526b290af97 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b36580c95f8be33e34a378034787526b290af97 -
Trigger Event:
workflow_dispatch
-
Statement type: