HDF5 tools and semantic metadata for agentic workflows
Project description
Agentic HDF5
A set of expansions and tools for Claude Code that enable AI agents to work at a high level with HDF5 data and files. Provides 10+ tools, 14+ skills, support for semantic metadata, and natural language search of vectorized semantic metadata.
Prerequisites
Claude Code must be installed. The MCP server is fetched and run via uvx, so uv must also be installed — all Python dependencies (h5py, numpy, matplotlib, etc.) are resolved automatically.
Installation
Run these slash commands inside a Claude Code session to register the plugin marketplace and install the plugin:
# Add the marketplace (one-time)
/plugin marketplace add mattjala/agentic-hdf5
# Install the plugin
/plugin install ahdf5-plugin@agentic-hdf5
This gives you all 14+ skills and 10 MCP tools automatically.
For development/testing, clone the repo and load the plugin directly:
git clone https://github.com/mattjala/agentic-hdf5.git
claude --plugin-dir ./agentic-hdf5/plugin
Architecture
Agentic HDF5 is composed of two complementary layers — tools and skills — that can be used independently or together.
Tools
Tools are Python functions that agents call to perform concrete operations on HDF5 files. They handle the actual file I/O: reading metadata, rechunking datasets, applying compression filters, writing semantic metadata, generating visualizations, and running semantic searches. Tools live in the tools/ directory and are registered in tools/tool_catalog.json.
| Tool | Description |
|---|---|
get_object_metadata |
Inspect dataset/group properties (shape, dtype, chunks, compression) |
rechunk_dataset |
Modify chunk layout (larger, smaller, exact dimensions, contiguous) |
apply_filter_dataset |
Apply or remove compression filters (gzip, szip, shuffle, etc.) |
visualize |
Generate plots from datasets (line, heatmap, histogram, contour, etc.) |
read_semantic_metadata |
Read semantic metadata (SMD) from an HDF5 object |
write_semantic_metadata |
Write or update SMD on a single object |
collect_objects_for_smd |
Scan a file for objects missing SMD |
write_smd_batch |
Write SMD to multiple objects in a single transaction |
vectorize_semantic_metadata |
Embed all SMD into vector representations for search |
query_semantic_metadata |
Natural language semantic search over vectorized SMD |
Skills
Skills are curated knowledge documents that teach the agent how and when to apply HDF5 best practices. They are loaded on-demand when a user's request matches the skill's domain, giving the agent expert-level guidance without bloating its context on every interaction. Skills live in .claude/skills/.
| Skill | Domain |
|---|---|
hdf5-chunking |
Chunk layout strategies and optimization |
hdf5-filters |
Compression and filter selection |
hdf5-io |
General I/O performance tuning |
hdf5-cloud-optimized |
Cloud/S3 access, paged aggregation, ros3 VFD |
hdf5-core-vfd |
In-memory file driver |
hdf5-parallel |
MPI-IO and parallel HDF5 |
hdf5-swmr |
Single Writer Multiple Reader access |
hdf5-vds |
Virtual datasets across multiple files |
hdf5-vol-usage |
Using VOL connectors (DAOS, Async, Cache, REST) |
hdf5-vol-dev |
Developing custom VOL connectors |
hdf5-visualization |
Plot type selection and matplotlib guidance |
hdf5-scientific-publishing |
DOIs, Zenodo/Dataverse, FAIR data practices |
hdf5-omni-selective |
OMNI file creation for selective data download |
hdf5-optimization |
General HDF5 optimization scripts |
How They Work Together
Tools and skills are designed to complement each other but neither requires the other:
- Skills alone — An agent can use skill knowledge to advise on HDF5 best practices (e.g., recommending a chunk layout) without modifying any files.
- Tools alone — An agent can call tools to inspect, optimize, or annotate files using the tool's built-in logic, without loading any skill context.
- Skills + Tools — The most powerful mode. A skill provides the agent with expert knowledge (e.g., chunking strategies for cloud access patterns), and the agent then uses tools to apply that knowledge to specific files (e.g., rechunking a dataset with the recommended layout).
Semantic Metadata (SMD)
Semantic metadata attributes (ahdf5-smd-*) attach human-readable, structured descriptions to HDF5 objects — describing what data represents, its provenance, units, and scientific significance. SMD bridges the gap between raw array data and human understanding.
See docs/semantic-metadata.md for the full specification.
Vectorized Semantic Metadata (VSMD)
VSMD converts text-based SMD into vector embeddings stored directly in the HDF5 file, enabling natural language search over datasets. An agent (or user) can query "temperature measurements in Celsius" and retrieve the most semantically relevant objects — without needing to know paths or attribute names.
See docs/vectorized-semantic-metadata.md for the design document.
Testing
python -m pytest tests/
Agent Tool Selection Evaluation
The tests/agent_tool_selection/ suite evaluates whether Claude models correctly identify the right HDF5 tool from natural language prompts. Run from a normal terminal (not inside Claude Code):
pytest -m agent tests/agent_tool_selection/ -v --model haiku
| Date | Model | Parameters | Score |
|---|---|---|---|
| 2026-03-16 | Claude Opus 4.6 | Not disclosed | 7/7 (100%) |
| 2026-03-16 | Claude Sonnet 4.6 | Not disclosed | 7/7 (100%) |
| 2026-03-16 | Claude Haiku 4.5 | Not disclosed | 7/7 (100%) |
| 2026-03-16 | Claude 3 Haiku | ~20B (est.) | 7/7 (100%) |
See tests/agent_tool_selection/RESULTS.md for full methodology and detailed results across prompt modes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_hdf5-0.2.0.tar.gz.
File metadata
- Download URL: agentic_hdf5-0.2.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab0ee4647a5668743f1cf6025aa6b23f0e1d03ec089082c165cddb7749a495cb
|
|
| MD5 |
bccf96c68246891982c1108eaa827a27
|
|
| BLAKE2b-256 |
d18b4de8a2680576d987f448f882d427d5fd0a85409683c390ea1f7082633dff
|
Provenance
The following attestation bundles were made for agentic_hdf5-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on mattjala/agentic-hdf5
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_hdf5-0.2.0.tar.gz -
Subject digest:
ab0ee4647a5668743f1cf6025aa6b23f0e1d03ec089082c165cddb7749a495cb - Sigstore transparency entry: 1148284056
- Sigstore integration time:
-
Permalink:
mattjala/agentic-hdf5@cc2875ed34ce8e8a98747845f773114129997dd2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mattjala
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@cc2875ed34ce8e8a98747845f773114129997dd2 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file agentic_hdf5-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agentic_hdf5-0.2.0-py3-none-any.whl
- Upload date:
- Size: 42.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cad7bcd8bfc1f6033609547e2790fff32510b881207149c7f42da7181eb4046c
|
|
| MD5 |
7ae5a8570164754339d42224d417f48c
|
|
| BLAKE2b-256 |
42c7ad69c4c958a1e4acd4a14bf8fc47cb67894d6db1375155e9611bad6ae0ec
|
Provenance
The following attestation bundles were made for agentic_hdf5-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on mattjala/agentic-hdf5
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_hdf5-0.2.0-py3-none-any.whl -
Subject digest:
cad7bcd8bfc1f6033609547e2790fff32510b881207149c7f42da7181eb4046c - Sigstore transparency entry: 1148284082
- Sigstore integration time:
-
Permalink:
mattjala/agentic-hdf5@cc2875ed34ce8e8a98747845f773114129997dd2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mattjala
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@cc2875ed34ce8e8a98747845f773114129997dd2 -
Trigger Event:
workflow_dispatch
-
Statement type: