Skip to main content

File loader framework with pluggable descriptors, validators, and a registry for format-specific data loading

Project description

vcti-fileloader

A protocol-based framework for loading hierarchical scientific and engineering data from files. It defines a standard interface that all file-format loaders must implement, and a registry for discovering and managing them at runtime.

This package is fully typed (py.typed) and safe for strict type checkers (mypy --strict, pyright).

Why this package exists

Applications that work with simulation and CAE data need to load many file formats — HDF5, VTK, OpenFOAM, proprietary binary, etc. Each format has its own library, its own API, and its own way of representing a tree of nodes, metadata, and heavy data arrays.

vcti-fileloader solves this by defining a single, uniform protocol that every loader plugin implements. Application code programs against the protocol, not the format. Adding support for a new file format means writing a new loader plugin — no changes to application code.

┌─────────────────────────────────────────────────────┐
│                  Application Code                   │
│         (uses Loader protocol + Registry)           │
└──────────────┬──────────────────────┬───────────────┘
               │                      │
       ┌───────▼───────┐      ┌───────▼───────┐
       │  HDF5 Loader  │      │  VTK Loader   │  ... (one per format)
       │  (plugin pkg) │      │  (plugin pkg) │
       └───────────────┘      └───────────────┘

Key Concepts

Loader (Protocol)

The central interface. Any class that implements these methods satisfies the protocol — no base class inheritance required (PEP 544 structural subtyping):

Method Purpose
load(path, **options) Open a file and return an opaque data handle
unload(data) Release file handles and memory (idempotent)
can_load(path) Lightweight check — can this loader handle the file?
load_tree(data) Extract the node hierarchy as a NumPy structured array
load_node_info(data) Extract lightweight node metadata (name, type)
load_attributes(data, node_ids) Extract key-value attributes per node
load_dataset(data, node_id) Extract a heavy data array as a DataNode

Each loader also carries optional validator and setup hooks:

  • LoaderValidator.validate() — returns True if all runtime dependencies (e.g., h5py, vtk) are available.
  • LoaderSetup.setup() — configures paths, environment variables, or component versions before first use.

LoaderDescriptor

Wraps a Loader instance with registry metadata — a unique id, a human-readable name, and filterable attributes (e.g., supported_formats).

LoaderRegistry

A typed registry of LoaderDescriptor entries. Register loaders at startup, then look them up by id or query by attributes at runtime.

Key behaviours (inherited from vcti-plugin-catalog):

  • register() raises DuplicateEntryError if the id already exists.
  • get() raises EntryNotFoundError if the id is not found.
  • find() returns None instead of raising for missing ids.
  • lookup property provides attribute-based filtering via vcti-lookup.

NodeID

Type alias (int) for node identifiers used across the protocol, exported from the package for use in type annotations.

DataNode

load_dataset() returns a DataNode from vcti-array-tree. A DataNode has:

  • .data — the NumPy array containing the heavy data.
  • .attributes — a dict[str, Any] of metadata for the node.

See the vcti-array-tree documentation for full details.

Data Flow

Register ──► Discover ──► Validate & Setup ──► Load ──► Query ──► Unload
  1. Register — Each loader plugin registers a LoaderDescriptor with the shared LoaderRegistry.
  2. Discover — Application code looks up a loader by id or filters by attributes (e.g., find all loaders that support "hdf5-file").
  3. Validate & Setup — Call validator.validate() and setup.setup() to ensure the runtime environment is ready.
  4. Loadloader.load(path) opens the file and returns an opaque handle.
  5. Query — Use load_tree, load_node_info, load_attributes, and load_dataset to extract structure and data from the handle.
  6. Unloadloader.unload(handle) releases resources. This is idempotent — calling it twice on the same handle must not raise.

Lifecycle Contracts

  • Call can_load(path) before load() to prevent UnsupportedFormatError.
  • Call validator.validate() and setup.setup() before the first load().
  • unload() is idempotent — safe to call multiple times on the same handle.
  • After unload(), calling any load_* method on that handle is undefined.
  • load() may be called multiple times with different paths; each returns an independent handle.

Installation

pip install vcti-fileloader>=1.0.0

In pyproject.toml dependencies

dependencies = [
    "vcti-fileloader>=1.0.0",
]

Quick Start

from pathlib import Path

from vcti.fileloader import LoaderDescriptor, LoaderRegistry

# At startup: register available loaders
registry = LoaderRegistry()
registry.register(LoaderDescriptor(
    id="hdf5-h5py-loader",
    name="HDF5 Loader (h5py)",
    loader=my_h5py_loader,                              # implements Loader protocol
    attributes={"supported_formats": ["hdf5-file"]},
))

# At runtime: discover, validate, load
desc = registry.get("hdf5-h5py-loader")
desc.loader.validator.validate()                         # check dependencies
desc.loader.setup.setup()                                # configure environment

handle = desc.loader.load(Path("simulation.h5"))
tree   = desc.loader.load_tree(handle)                   # node hierarchy
info   = desc.loader.load_node_info(handle)              # node names and types
attrs  = desc.loader.load_attributes(handle)             # per-node attributes
node   = desc.loader.load_dataset(handle, node_id=1)     # heavy data array
desc.loader.unload(handle)                               # release resources

Error Handling

All exceptions inherit from LoaderError, so callers can catch broadly or handle specific failure modes:

Exception When to raise / catch
LoaderError Base — catches any loader failure
LoadError File cannot be opened or parsed (I/O errors, corrupt files)
UnloadError Resource cleanup failed
UnsupportedFormatError Loader does not recognise the file format. Prefer can_load() first
ValidationError validator.validate() detected missing dependencies
SetupError setup.setup() could not configure the environment

Distinguishing LoadError vs UnsupportedFormatError: Use UnsupportedFormatError when the loader does not recognise the format at all (wrong extension, unknown magic bytes). Use LoadError when the format is recognised but the content cannot be read (truncated file, incompatible version, permission error).

Error handling example

from vcti.fileloader import (
    LoaderError,
    LoadError,
    UnsupportedFormatError,
    ValidationError,
    SetupError,
)

# Validate before loading
if not desc.loader.validator.validate():
    raise ValidationError("Missing h5py — install with: pip install h5py")

if not desc.loader.setup.setup():
    raise SetupError("Could not configure HDF5 library paths")

# Load with error handling
path = Path("data.h5")
if not desc.loader.can_load(path):
    print(f"Loader {desc.id} cannot handle {path}")
else:
    try:
        handle = desc.loader.load(path)
        tree = desc.loader.load_tree(handle)
    except LoadError as e:
        print(f"Failed to read file: {e}")
    except LoaderError as e:
        print(f"Unexpected loader error: {e}")
    finally:
        desc.loader.unload(handle)  # safe even if load() failed partially

What this package does NOT do

  • No concrete loaders — This is the interface only. Actual file reading (HDF5, VTK, etc.) lives in separate loader plugin packages.
  • No data transformation — Data is returned as-is from the loader.
  • No caching — Caching strategies belong at the application level.

Further Reading

  • Common Patterns — Validator/setup implementation, multi-loader registration, error handling, and naming conventions.
  • Design & Concepts — Architecture, protocol rationale, and package boundaries.

Dependencies

Versioning

This package follows Semantic Versioning. Breaking changes to the Loader protocol (adding required methods, changing signatures) will only occur in major version bumps. Downstream loader plugins should pin to a compatible major version (e.g., vcti-fileloader>=1.0,<2).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader-1.0.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_fileloader-1.0.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file vcti_fileloader-1.0.0.tar.gz.

File metadata

  • Download URL: vcti_fileloader-1.0.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcti_fileloader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 82c1e34f1b144174125f80913e410359819d77d45b01609db0a41fb6b9b2f642
MD5 359348d7154d1c2f77ce1d1ae16e765e
BLAKE2b-256 f930b02196ff9a1f8e968be6ee90ff79981c9f05f6db54c4670311098a9f96f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader-1.0.0.tar.gz:

Publisher: publish.yml on vcollab/vcti-python-fileloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_fileloader-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_fileloader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5bf42bbf6265efebf49806fdbbcca45e6494bbef260ce7f3cb5de73c8dd59315
MD5 c0987557b01a9fed04c648d80b889ad8
BLAKE2b-256 07c974765e64e70101352f938e96e5f03c7863ec9b5e58a3b3afe71e43da269e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader-1.0.0-py3-none-any.whl:

Publisher: publish.yml on vcollab/vcti-python-fileloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page