NumPy-backed NPY, NPZ, and CSV file loaders for the vcti-fileloader framework
Project description
FileLoader NumPy
NumPy-backed NPY, NPZ, and CSV file loaders for the
vcti-fileloader framework.
Overview
vcti-fileloader-numpy ships three loader plugins for the
vcti-fileloader framework, all implemented against NumPy:
NpyLoader— loads.npysingle-array files into a one-node subtree. Supports memory-mapped reads via themmap_modeoption.NpzLoader— loads.npzarchives (zip-of-NPY) into a multi-child subtree, one child per array name. Defaults toLazyDataNodechildren so callers can browseshape/dtypewithout materialising; passlazy=Falsefor eager reads.CsvLoader— loads delimited-text files (.csv/.tsv/.txt) vianumpy.genfromtxtinto a one-node subtree. Passes through anygenfromtxtkeyword; whennames=Trueis used, the loader stamps the column names on the subtree root asfile_attributes["columns"].
All three implement the vcti.fileloader.core.Loader protocol and
write against the LockableTree protocol from vcti-tree, so the
caller picks the backing.
Installation
pip install vcti-fileloader-numpy>=1.0.1
In pyproject.toml dependencies
dependencies = [
"vcti-fileloader-numpy>=1.0.1",
]
Quick Start
NPY
from pathlib import Path
from vcti.fileloader.core import DataNode
from vcti.fileloader.numpy import NpyLoader
from vcti.tree import DictTree
loader = NpyLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.npy")) as handle:
root = loader.populate(handle, tree, tree.root_handle)
[child] = list(tree.children(root))
print(tree.payload(child).data) # the loaded array
Memory-map a large file instead of reading into memory:
arr = loader.load(Path("big.npy"), mmap_mode="r")
NPZ
from vcti.fileloader.core import materialise_subtree
from vcti.fileloader.numpy import NpzLoader
loader = NpzLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.npz")) as handle:
root = loader.populate(handle, tree, tree.root_handle) # lazy=True default
materialise_subtree(tree, root) # read everything
# `handle` is closed here, but the tree is still usable.
for child in tree.children(root):
p = tree.payload(child)
print(p.name, p.shape, p.dtype)
For eager reads (no closure, no handle-lifetime concern):
with loader.open(Path("data.npz")) as handle:
root = loader.populate(handle, tree, tree.root_handle, lazy=False)
CSV / TSV
from vcti.fileloader.numpy import CsvLoader
loader = CsvLoader()
tree: DictTree[DataNode] = DictTree(DataNode())
with loader.open(Path("data.csv"), delimiter=",") as handle:
root = loader.populate(handle, tree, tree.root_handle)
With a header row producing a structured array:
with loader.open(Path("data.csv"), delimiter=",", names=True) as handle:
root = loader.populate(handle, tree, tree.root_handle)
# Root's file_attributes["columns"] now lists the column names.
Subtree shapes
| Loader | Root payload | Children |
|---|---|---|
NpyLoader |
empty DataNode |
1 × DataNode(data=<array>) |
NpzLoader (lazy) |
empty DataNode |
N × LazyDataNode(name=key, shape, dtype) |
NpzLoader (eager) |
empty DataNode |
N × DataNode(name=key, data=<array>) |
CsvLoader (homogeneous) |
empty DataNode |
1 × DataNode(data=<array>) |
CsvLoader (structured, names=True) |
DataNode(file_attributes={"columns": [...]}) |
1 × DataNode(data=<array>) |
Handle lifetime contract (NPZ lazy nodes)
NpzLoader.populate(..., lazy=True) attaches LazyDataNodes whose
closures hold the open NpzFile handle. Once
loader.unload(handle) runs, those closures cannot fulfil further
.load() calls. Three patterns avoid the problem:
- Keep the handle open for the lifetime of the tree.
- Materialise then unload: call
materialise_subtree(tree, root)beforeunload. Every lazy node loads, and the tree is fully usable without the handle. - Use eager mode:
populate(..., lazy=False).
materialise_subtree(tree, root_handle) is exported from
vcti.fileloader.core.
Error handling
from vcti.fileloader.core import (
LoadError,
UnsupportedFormatError,
TreeAttachmentError,
)
try:
with loader.open(Path("data.npy")) as handle:
root = loader.populate(handle, tree, tree.root_handle)
except FileNotFoundError:
...
except UnsupportedFormatError:
# File extension is not recognised by this loader
...
except LoadError:
# File could not be parsed
...
except TreeAttachmentError:
# parent is missing, deleted, or structure-locked in `tree`
...
If populate fails partway (a parse error during NPZ traversal, or
an exception in before_lock), the partial subtree is removed
before the exception propagates — callers never see a half-built
subtree.
What this package does NOT do
- Pandas-flavoured CSV. This loader uses
numpy.genfromtxt, which is fast and dependency-free but lacks the schema inference and rich missing-value handling ofpandas.read_csv. A separatevcti-fileloader-pandasis the right home for that. - Schema validation. The loaders accept whatever NumPy parses.
Validation belongs in a
before_lockhook or downstream pass. - Streaming reads.
numpy.loadandnumpy.genfromtxtread the whole file (or array, for NPZ keys). Out-of-memory cases need a custom loader ormmap_modefor NPY. - Attribute synthesis. No
file_path, no derived storage metadata. Stamp those via thebefore_lockhook or a downstream enricher (e.g.vcti-attribute-enricher).
Dependencies
- numpy (>=1.26)
- vcti-fileloader (>=5.1.0) —
Loaderprotocol,SubtreeBuilder,DataNode,LazyDataNode,materialise_subtree(import fromvcti.fileloader.core) - vcti-tree (>=1.0.0) —
LockableTreeprotocol
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcti_fileloader_numpy-1.0.1.tar.gz.
File metadata
- Download URL: vcti_fileloader_numpy-1.0.1.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7abc4160eef3ff0ec076e8dd42550218bbf1cfa034b676f681cd562812778179
|
|
| MD5 |
037a9fba7ef6582ddb5302897c20c137
|
|
| BLAKE2b-256 |
57b86e1ef989335249d65012bade36f8d6bb3978a2fec2d303f6d789663e92ec
|
Provenance
The following attestation bundles were made for vcti_fileloader_numpy-1.0.1.tar.gz:
Publisher:
release.yml on vcollab/vcti-python-fileloader-numpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader_numpy-1.0.1.tar.gz -
Subject digest:
7abc4160eef3ff0ec076e8dd42550218bbf1cfa034b676f681cd562812778179 - Sigstore transparency entry: 1746335246
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader-numpy@c3ec60500bb2bb7d2385c97b2cbea57a86a57e61 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3ec60500bb2bb7d2385c97b2cbea57a86a57e61 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vcti_fileloader_numpy-1.0.1-py3-none-any.whl.
File metadata
- Download URL: vcti_fileloader_numpy-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fc9c991c7b9732afe14fd9d3789baa90ff99755384aa0db5b45a03540ab59cd
|
|
| MD5 |
fcc49b14e5e0000ef2e5f8ffe1bfc14e
|
|
| BLAKE2b-256 |
3c5795d80056d555b0a774c8fd767d88a41b7c4864b7b4078b9793027ad5eb62
|
Provenance
The following attestation bundles were made for vcti_fileloader_numpy-1.0.1-py3-none-any.whl:
Publisher:
release.yml on vcollab/vcti-python-fileloader-numpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_fileloader_numpy-1.0.1-py3-none-any.whl -
Subject digest:
0fc9c991c7b9732afe14fd9d3789baa90ff99755384aa0db5b45a03540ab59cd - Sigstore transparency entry: 1746335462
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-fileloader-numpy@c3ec60500bb2bb7d2385c97b2cbea57a86a57e61 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3ec60500bb2bb7d2385c97b2cbea57a86a57e61 -
Trigger Event:
push
-
Statement type: