Skip to main content

NumPy-backed NPY, NPZ, and CSV file loaders for the vcti-fileloader framework

Project description

FileLoader NumPy

NumPy-backed NPY, NPZ, and CSV file loaders for the vcti-fileloader framework.

Overview

vcti-fileloader-numpy ships three loader plugins for the vcti-fileloader framework, all implemented against NumPy:

  • NpyLoader — loads .npy single-array files into a one-node subtree. Supports memory-mapped reads via the mmap_mode option.
  • NpzLoader — loads .npz archives (zip-of-NPY) into a multi-child subtree, one child per array name. Defaults to LazyDataNode children so callers can browse shape / dtype without materialising; pass lazy=False for eager reads.
  • CsvLoader — loads delimited-text files (.csv / .tsv / .txt) via numpy.genfromtxt into a one-node subtree. Passes through any genfromtxt keyword; when names=True is used, the loader stamps the column names on the subtree root as file_attributes["columns"].

All three implement the vcti.fileloader.core.Loader protocol and write against the LockableTree protocol from vcti-tree, so the caller picks the backing.

Installation

pip install vcti-fileloader-numpy>=1.0.1

In pyproject.toml dependencies

dependencies = [
    "vcti-fileloader-numpy>=1.0.1",
]

Quick Start

NPY

from pathlib import Path

from vcti.fileloader.core import DataNode
from vcti.fileloader.numpy import NpyLoader
from vcti.tree import DictTree

loader = NpyLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.npy")) as handle:
    root = loader.populate(handle, tree, tree.root_handle)

[child] = list(tree.children(root))
print(tree.payload(child).data)        # the loaded array

Memory-map a large file instead of reading into memory:

arr = loader.load(Path("big.npy"), mmap_mode="r")

NPZ

from vcti.fileloader.core import materialise_subtree
from vcti.fileloader.numpy import NpzLoader

loader = NpzLoader()
tree: DictTree[DataNode] = DictTree(DataNode())

with loader.open(Path("data.npz")) as handle:
    root = loader.populate(handle, tree, tree.root_handle)   # lazy=True default
    materialise_subtree(tree, root)                          # read everything
# `handle` is closed here, but the tree is still usable.

for child in tree.children(root):
    p = tree.payload(child)
    print(p.name, p.shape, p.dtype)

For eager reads (no closure, no handle-lifetime concern):

with loader.open(Path("data.npz")) as handle:
    root = loader.populate(handle, tree, tree.root_handle, lazy=False)

CSV / TSV

from vcti.fileloader.numpy import CsvLoader

loader = CsvLoader()
tree: DictTree[DataNode] = DictTree(DataNode())

with loader.open(Path("data.csv"), delimiter=",") as handle:
    root = loader.populate(handle, tree, tree.root_handle)

With a header row producing a structured array:

with loader.open(Path("data.csv"), delimiter=",", names=True) as handle:
    root = loader.populate(handle, tree, tree.root_handle)
# Root's file_attributes["columns"] now lists the column names.

Subtree shapes

Loader Root payload Children
NpyLoader empty DataNode 1 × DataNode(data=<array>)
NpzLoader (lazy) empty DataNode N × LazyDataNode(name=key, shape, dtype)
NpzLoader (eager) empty DataNode N × DataNode(name=key, data=<array>)
CsvLoader (homogeneous) empty DataNode 1 × DataNode(data=<array>)
CsvLoader (structured, names=True) DataNode(file_attributes={"columns": [...]}) 1 × DataNode(data=<array>)

Handle lifetime contract (NPZ lazy nodes)

NpzLoader.populate(..., lazy=True) attaches LazyDataNodes whose closures hold the open NpzFile handle. Once loader.unload(handle) runs, those closures cannot fulfil further .load() calls. Three patterns avoid the problem:

  1. Keep the handle open for the lifetime of the tree.
  2. Materialise then unload: call materialise_subtree(tree, root) before unload. Every lazy node loads, and the tree is fully usable without the handle.
  3. Use eager mode: populate(..., lazy=False).

materialise_subtree(tree, root_handle) is exported from vcti.fileloader.core.


Error handling

from vcti.fileloader.core import (
    LoadError,
    UnsupportedFormatError,
    TreeAttachmentError,
)

try:
    with loader.open(Path("data.npy")) as handle:
        root = loader.populate(handle, tree, tree.root_handle)
except FileNotFoundError:
    ...
except UnsupportedFormatError:
    # File extension is not recognised by this loader
    ...
except LoadError:
    # File could not be parsed
    ...
except TreeAttachmentError:
    # parent is missing, deleted, or structure-locked in `tree`
    ...

If populate fails partway (a parse error during NPZ traversal, or an exception in before_lock), the partial subtree is removed before the exception propagates — callers never see a half-built subtree.


What this package does NOT do

  • Pandas-flavoured CSV. This loader uses numpy.genfromtxt, which is fast and dependency-free but lacks the schema inference and rich missing-value handling of pandas.read_csv. A separate vcti-fileloader-pandas is the right home for that.
  • Schema validation. The loaders accept whatever NumPy parses. Validation belongs in a before_lock hook or downstream pass.
  • Streaming reads. numpy.load and numpy.genfromtxt read the whole file (or array, for NPZ keys). Out-of-memory cases need a custom loader or mmap_mode for NPY.
  • Attribute synthesis. No file_path, no derived storage metadata. Stamp those via the before_lock hook or a downstream enricher (e.g. vcti-attribute-enricher).

Dependencies

  • numpy (>=1.26)
  • vcti-fileloader (>=5.1.0) — Loader protocol, SubtreeBuilder, DataNode, LazyDataNode, materialise_subtree (import from vcti.fileloader.core)
  • vcti-tree (>=1.0.0) — LockableTree protocol

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_fileloader_numpy-1.0.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_fileloader_numpy-1.0.1-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file vcti_fileloader_numpy-1.0.1.tar.gz.

File metadata

  • Download URL: vcti_fileloader_numpy-1.0.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_fileloader_numpy-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7abc4160eef3ff0ec076e8dd42550218bbf1cfa034b676f681cd562812778179
MD5 037a9fba7ef6582ddb5302897c20c137
BLAKE2b-256 57b86e1ef989335249d65012bade36f8d6bb3978a2fec2d303f6d789663e92ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_numpy-1.0.1.tar.gz:

Publisher: release.yml on vcollab/vcti-python-fileloader-numpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_fileloader_numpy-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_fileloader_numpy-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0fc9c991c7b9732afe14fd9d3789baa90ff99755384aa0db5b45a03540ab59cd
MD5 fcc49b14e5e0000ef2e5f8ffe1bfc14e
BLAKE2b-256 3c5795d80056d555b0a774c8fd767d88a41b7c4864b7b4078b9793027ad5eb62

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_fileloader_numpy-1.0.1-py3-none-any.whl:

Publisher: release.yml on vcollab/vcti-python-fileloader-numpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page