Skip to main content

Built-in file format descriptors (HDF5, CAX, JSON, NPY, NPZ, CSV) for the vcti-path-format identification framework

Project description

Path Format Descriptors

Built-in file format descriptors for the vcti-path-format identification framework.

Overview

vcti-path-format-descriptors ships ready-made FormatDescriptor instances for the file formats VCTI tooling needs to recognize: HDF5, VCollab CAX, JSON, NumPy NPY/NPZ, and CSV. Each descriptor is a self-contained factory function that wires the appropriate magic-byte and/or extension validators onto a HeuristicEvaluator, tags the result with attributes from the shared vocabulary (vcti-path-format-attributes), and returns it for registration with a FormatRegistry. The package is the plugin layer between the format-agnostic framework and the shared attribute vocabulary — applications register the descriptors they need (or all of them at once) and let FormatIdentifier do the identification.

Installation

pip install vcti-path-format-descriptors>=1.2.0

In pyproject.toml dependencies

dependencies = [
    "vcti-path-format-descriptors>=1.2.0",
]

Quick Start

from pathlib import Path

from vcti.pathformat import FormatRegistry, FormatIdentifier
from vcti.pathformat.descriptors import register_all_formats

# Register all built-in format descriptors
registry = FormatRegistry()
register_all_formats(registry)

# Identify a file
identifier = FormatIdentifier(registry)
results = identifier.identify_file_format(Path("data.h5"))

Individual descriptors

from vcti.pathformat.descriptors import (
    get_cax_file_descriptor,
    get_csv_file_descriptor,
    get_hdf5_file_descriptor,
    get_json_file_descriptor,
    get_npy_file_descriptor,
    get_npz_file_descriptor,
)

registry = FormatRegistry()
registry.register(get_hdf5_file_descriptor())
registry.register(get_cax_file_descriptor())
registry.register(get_json_file_descriptor())
registry.register(get_npy_file_descriptor())
registry.register(get_npz_file_descriptor())
registry.register(get_csv_file_descriptor())

Built-in Formats

HDF5

Property Value
ID hdf5-file
Signature \x89HDF\r\n\x1a\n (8 bytes)
Extensions .h5, .hdf5
Validators Magic bytes (GATE) + Extension (EVIDENCE)
Attributes path_type=file, structure=hdf5

VCollab CAX

Property Value
ID vcti-cax
Signature \x89VCF\r\n\x1a\n (8 bytes)
Validators Magic bytes (GATE)
Attributes path_type=file, structure=binary, generator=VCollab

JSON

Property Value
ID json-file
Signature none (text format)
Extensions .json
Validators Extension (EVIDENCE)
Attributes path_type=file, structure=json
Best confidence LIKELY (no GATE)

NumPy NPY

Property Value
ID npy-file
Signature \x93NUMPY (6 bytes)
Extensions .npy
Validators Magic bytes (GATE) + Extension (EVIDENCE)
Attributes path_type=file, structure=binary

NumPy NPZ

Property Value
ID npz-file
Signature PK\x03\x04 (ZIP local file header, 4 bytes)
Extensions .npz
Validators Magic bytes (GATE) + Extension (EVIDENCE)
Attributes path_type=file, structure=binary

Note: the magic bytes are the standard ZIP local file header. The .npz extension is what distinguishes NumPy archives from other ZIP-family formats; any future ZIP-family descriptors must coordinate on the extension.

CSV

Property Value
ID csv-file
Signature none (text format)
Extensions .csv
Validators Extension (EVIDENCE)
Attributes path_type=file, structure=csv
Best confidence LIKELY (no GATE)

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_path_format_descriptors-1.2.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_path_format_descriptors-1.2.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file vcti_path_format_descriptors-1.2.0.tar.gz.

File metadata

File hashes

Hashes for vcti_path_format_descriptors-1.2.0.tar.gz
Algorithm Hash digest
SHA256 981c1e4ec6fa97e06fee4eca4620b8f9c455c93687cc5c81583bd5a63f791245
MD5 5467927af47600c684186b03a77bba7d
BLAKE2b-256 50580b1a2a6402fbf4a5e27ae9ceb2f3c51667dfea865ebff309ca2adbef02f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_path_format_descriptors-1.2.0.tar.gz:

Publisher: publish.yml on vcollab/vcti-python-path-format-descriptors

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_path_format_descriptors-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vcti_path_format_descriptors-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67743de5f525a2c8bff9a846cd4220f6376dccb06d29354c4f82afd70e2519c7
MD5 126efe64b4a6d83a53a2a0dfad852b7d
BLAKE2b-256 68c6e0510fc082cd4b58a650512316f52abf7c2e752ee6e61c9d21b6d975ac54

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_path_format_descriptors-1.2.0-py3-none-any.whl:

Publisher: publish.yml on vcollab/vcti-python-path-format-descriptors

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page