Built-in file format descriptors (HDF5, CAX, JSON, NPY, NPZ, CSV) for the vcti-path-format identification framework
Project description
Path Format Descriptors
Built-in file format descriptors for the vcti-path-format identification framework.
Overview
vcti-path-format-descriptors ships ready-made FormatDescriptor
instances for the file formats VCTI tooling needs to recognize: HDF5,
VCollab CAX, JSON, NumPy NPY/NPZ, and CSV. Each descriptor is a
self-contained factory function that wires the appropriate magic-byte
and/or extension validators onto a HeuristicEvaluator, tags the
result with attributes from the shared vocabulary
(vcti-path-format-attributes), and returns it for registration with a
FormatRegistry. The package is the plugin layer between the
format-agnostic framework and the shared attribute vocabulary —
applications register the descriptors they need (or all of them at
once) and let FormatIdentifier do the identification.
Installation
pip install vcti-path-format-descriptors>=1.2.0
In pyproject.toml dependencies
dependencies = [
"vcti-path-format-descriptors>=1.2.0",
]
Quick Start
from pathlib import Path
from vcti.pathformat import FormatRegistry, FormatIdentifier
from vcti.pathformat.descriptors import register_all_formats
# Register all built-in format descriptors
registry = FormatRegistry()
register_all_formats(registry)
# Identify a file
identifier = FormatIdentifier(registry)
results = identifier.identify_file_format(Path("data.h5"))
Individual descriptors
from vcti.pathformat.descriptors import (
get_cax_file_descriptor,
get_csv_file_descriptor,
get_hdf5_file_descriptor,
get_json_file_descriptor,
get_npy_file_descriptor,
get_npz_file_descriptor,
)
registry = FormatRegistry()
registry.register(get_hdf5_file_descriptor())
registry.register(get_cax_file_descriptor())
registry.register(get_json_file_descriptor())
registry.register(get_npy_file_descriptor())
registry.register(get_npz_file_descriptor())
registry.register(get_csv_file_descriptor())
Built-in Formats
HDF5
| Property | Value |
|---|---|
| ID | hdf5-file |
| Signature | \x89HDF\r\n\x1a\n (8 bytes) |
| Extensions | .h5, .hdf5 |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=hdf5 |
VCollab CAX
| Property | Value |
|---|---|
| ID | vcti-cax |
| Signature | \x89VCF\r\n\x1a\n (8 bytes) |
| Validators | Magic bytes (GATE) |
| Attributes | path_type=file, structure=binary, generator=VCollab |
JSON
| Property | Value |
|---|---|
| ID | json-file |
| Signature | none (text format) |
| Extensions | .json |
| Validators | Extension (EVIDENCE) |
| Attributes | path_type=file, structure=json |
| Best confidence | LIKELY (no GATE) |
NumPy NPY
| Property | Value |
|---|---|
| ID | npy-file |
| Signature | \x93NUMPY (6 bytes) |
| Extensions | .npy |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=binary |
NumPy NPZ
| Property | Value |
|---|---|
| ID | npz-file |
| Signature | PK\x03\x04 (ZIP local file header, 4 bytes) |
| Extensions | .npz |
| Validators | Magic bytes (GATE) + Extension (EVIDENCE) |
| Attributes | path_type=file, structure=binary |
Note: the magic bytes are the standard ZIP local file header. The
.npz extension is what distinguishes NumPy archives from other
ZIP-family formats; any future ZIP-family descriptors must coordinate
on the extension.
CSV
| Property | Value |
|---|---|
| ID | csv-file |
| Signature | none (text format) |
| Extensions | .csv |
| Validators | Extension (EVIDENCE) |
| Attributes | path_type=file, structure=csv |
| Best confidence | LIKELY (no GATE) |
Dependencies
- vcti-path-format (>=1.0.0) — format identification framework
- vcti-path-format-attributes (>=1.1.0) — domain vocabulary enums
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcti_path_format_descriptors-1.2.0.tar.gz.
File metadata
- Download URL: vcti_path_format_descriptors-1.2.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
981c1e4ec6fa97e06fee4eca4620b8f9c455c93687cc5c81583bd5a63f791245
|
|
| MD5 |
5467927af47600c684186b03a77bba7d
|
|
| BLAKE2b-256 |
50580b1a2a6402fbf4a5e27ae9ceb2f3c51667dfea865ebff309ca2adbef02f9
|
Provenance
The following attestation bundles were made for vcti_path_format_descriptors-1.2.0.tar.gz:
Publisher:
publish.yml on vcollab/vcti-python-path-format-descriptors
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_path_format_descriptors-1.2.0.tar.gz -
Subject digest:
981c1e4ec6fa97e06fee4eca4620b8f9c455c93687cc5c81583bd5a63f791245 - Sigstore transparency entry: 1626679744
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-path-format-descriptors@e13a752471b3d462fa74030dc654af2873520213 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e13a752471b3d462fa74030dc654af2873520213 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file vcti_path_format_descriptors-1.2.0-py3-none-any.whl.
File metadata
- Download URL: vcti_path_format_descriptors-1.2.0-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67743de5f525a2c8bff9a846cd4220f6376dccb06d29354c4f82afd70e2519c7
|
|
| MD5 |
126efe64b4a6d83a53a2a0dfad852b7d
|
|
| BLAKE2b-256 |
68c6e0510fc082cd4b58a650512316f52abf7c2e752ee6e61c9d21b6d975ac54
|
Provenance
The following attestation bundles were made for vcti_path_format_descriptors-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on vcollab/vcti-python-path-format-descriptors
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_path_format_descriptors-1.2.0-py3-none-any.whl -
Subject digest:
67743de5f525a2c8bff9a846cd4220f6376dccb06d29354c4f82afd70e2519c7 - Sigstore transparency entry: 1626679772
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-path-format-descriptors@e13a752471b3d462fa74030dc654af2873520213 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e13a752471b3d462fa74030dc654af2873520213 -
Trigger Event:
workflow_dispatch
-
Statement type: