Skip to main content

Data-scope abstraction for grouping related data sources under one managed lifecycle, with format-aware loader resolution.

Project description

Data Scope

Data-scope abstraction for grouping related data sources under one managed lifecycle, with format-aware loader resolution.

Overview

vcti-data-scope provides a small framework for tying related data sources together — files, folders, and (in future) other source kinds — under one named, lifecycle-managed object. Each source is added with an explicit format identifier; the scope resolves a loader for it via a LoaderRegistry (from vcti-fileloader), and the user opens and closes the whole collection together.

The framework is pluggable: a DataScope is the base type, with one concrete subclass shipping today (PathsGroup for file and folder sources) and additional types possible later (positional file arrays, parameter sweeps, streaming sources, etc.). v1 is read-only after open and uses a strict load policy for required sources; optional sources can fail without aborting the scope.

Installation

pip install vcti-data-scope>=1.0.0

In requirements.txt

vcti-data-scope>=1.0.0

In pyproject.toml dependencies

dependencies = [
    "vcti-data-scope>=1.0.0",
]

Preparing a registry

PathsGroup resolves loaders from a LoaderRegistry (from vcti-fileloader). The registry must be populated with descriptors before the scope is used. Each descriptor's attributes["supported_formats"] list is what the scope matches against the format_id you pass when adding a source.

from vcti.fileloader import LoaderRegistry, LoaderDescriptor

registry = LoaderRegistry()
registry.register(
    LoaderDescriptor(
        id="hdf5-h5py",
        name="HDF5 (h5py)",
        loader=H5pyLoader(),  # your loader implementing the Loader protocol
        attributes={"supported_formats": ["hdf5-file"]},
    )
)
# ...register one descriptor per loader you need

Most callers wrap this setup in a helper module so application code receives a ready-to-use registry.


Quick Start (context manager)

When scope usage is confined to a single block, the context manager is the most concise form. The scope closes automatically on exit, even if an exception is raised inside the block.

from pathlib import Path

from vcti.datascope import PathsGroup

with PathsGroup("brake-squeal", registry=registry) as scope:
    scope.add_path_source(
        name="solver_input",
        path=Path("model.inp"),
        format_id="abaqus-inp",
    )
    scope.add_path_source(
        name="solver_output",
        path=Path("sol103.h5"),
        format_id="hdf5-file",
    )
    scope.add_path_source(
        name="solver_log",
        path=Path("run.log"),
        format_id="text-log",
        required=False,
    )

    scope.load()
    assert scope.is_valid

    # Reach into per-source loaders for typed access:
    h5_loader = scope.sources["solver_output"].loader
    # ... use h5_loader's typed API ...

Usage without the context manager

When the scope's lifetime spans function boundaries — for example, a scope owned by a long-lived service, an interactive session, or a class attribute — open and close it explicitly. The contract is the same as the context-manager form; only the syntax differs.

Plain open / close

from pathlib import Path

from vcti.datascope import PathsGroup

scope = PathsGroup("brake-squeal", registry=registry)
scope.add_path_source("solver_input", Path("model.inp"), format_id="abaqus-inp")
scope.add_path_source("solver_output", Path("sol103.h5"), format_id="hdf5-file")

if not scope.is_valid:
    raise RuntimeError("scope not loadable — some required source is unavailable")
scope.load()
try:
    # ... use scope.sources["..."].loader ...
    ...
finally:
    scope.close()

scope.close() is idempotent and best-effort: it walks every source, closes the ones that are loaded, and logs (rather than raises) on per-source close failures. It is always safe to call — including before load() and after a failed load().

As an attribute of a long-lived object

class AnalysisSession:
    def __init__(self, registry):
        self._scope = PathsGroup("session", registry=registry)

    def open(self, model_path, output_path):
        self._scope.add_path_source("input",  model_path, format_id="abaqus-inp")
        self._scope.add_path_source("output", output_path, format_id="hdf5-file")
        self._scope.load()

    def close(self):
        self._scope.close()

    @property
    def output_loader(self):
        return self._scope.sources["output"].loader

Reopening after close

After close(), the scope may be reopened. Optional sources that failed previously have their last_error cleared and are retried on the next load(). Sources cannot be added or removed while the scope is open (DataScopeStateError); add or remove before calling load() again.

scope.load()
# ... use ...
scope.close()

# ... later ...
scope.load()   # re-opens; failed optionals get another chance

Working with optional sources

Sources added with required=False do not abort load() on failure; their failure is recorded and the scope continues:

scope.load()

if not scope.is_valid:
    raise RuntimeError("scope is not in a usable state")

for src in scope.failed_optional_sources.values():
    log.warning("optional source %r unavailable: %s", src.name, src.last_error)

is_valid is a pre-flight check (scope.is_valid, no parens): "could this scope be loaded right now?" Specifically:

  • Empty scope (no sources) — invalid.
  • Every required source's own is_valid is True — scope is valid.
  • A loaded scope short-circuits to True without re-checking — load() would have raised on any required failure, so reaching is_loaded already proves validity. While unloaded, the check is re-run on every call (no caching), so a moved or deleted file is detected immediately.

is_loaded answers a different question — "has load() actually completed?" Use is_valid before opening to confirm readiness; use is_loaded after opening to confirm the lifecycle finished.


Disambiguating between loaders that share a format

When several registered loaders declare the same format_id (e.g. two HDF5 readers for different solvers), pass extra_rules to narrow the selection. Rules are vcti.lookup.Rule instances applied alongside the implicit supported_formats contains <format_id> rule:

from vcti.lookup import Rule

scope.add_path_source(
    name="solver_output",
    path=Path("sol103.h5"),
    format_id="hdf5-file",
    extra_rules=[Rule("solver", "==", "nastran")],
)

If no descriptor matches, add_path_source raises ValueError at the point of registration — not later at load() time.


See docs/design.md for the conceptual model and docs/api.md for the API reference.


Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_data_scope-1.0.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_data_scope-1.0.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file vcti_data_scope-1.0.0.tar.gz.

File metadata

  • Download URL: vcti_data_scope-1.0.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_data_scope-1.0.0.tar.gz
Algorithm Hash digest
SHA256 16ae60a0029f034b032b11814e51808ec217f4980b4cae3fedee0935755b4e5e
MD5 bb5097986eb962611c9905eb8da28b2b
BLAKE2b-256 55d90b282c00d914976a23f14b6c9c7597590999712e6ecf17039c64dc13c16e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_data_scope-1.0.0.tar.gz:

Publisher: release.yml on vcollab/vcti-python-data-scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_data_scope-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: vcti_data_scope-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_data_scope-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ac0c2115300d6269f7a7f80041704ea22cd549b82e353fc713f31445d766880
MD5 5609b71f37370f3370bfa88dd5b3c37a
BLAKE2b-256 9af65e33275370845ee24ef271a08d3d5db6aff1404b5ba1f60e7ee2915e328e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_data_scope-1.0.0-py3-none-any.whl:

Publisher: release.yml on vcollab/vcti-python-data-scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page