Skip to main content

A command-line tool to convert HDF5 files to markdown format

Project description

HDF5 to Markdown Converter

A command-line tool to convert HDF5 files to AI-friendly markdown format with key-value structure. This tool helps you visualize the structure, metadata, and actual data from HDF5 files in a format optimized for both human readability and AI consumption.

Features

  • AI-friendly key-value format - Structured output optimized for AI parsing
  • Smart data subsetting - Preview large datasets with configurable row/column limits
  • Multiple sampling strategies - Choose how to sample data: first, uniform, or edges
  • Flexible data preview - Include or exclude actual data values
  • Complete metadata - Display file structure, groups, datasets, and attributes
  • External link support - Detect and display HDF5 external links
  • Compression info - Show dataset compression and chunking details

Installation

# Clone the repository
git clone https://github.com/hyoklee/h5md.git
cd h5md

# Install in development mode
pip install -e .

Or install directly from GitHub:

pip install git+https://github.com/hyoklee/h5md.git

Usage

Command Line

Basic conversion (uses defaults: 10 rows/cols, 'first' sampling):

h5md input.h5

This will create input.md in the same directory.

Custom output path:

h5md input.h5 -o output.md

Control data subsetting:

# Limit to 5 rows and 5 columns
h5md input.h5 --max-rows 5 --max-cols 5

# Show all data (use carefully with large files!)
h5md input.h5 --max-rows 0 --max-cols 0

# Metadata only (no data values)
h5md input.h5 --no-data

Choose sampling strategy:

# Take first N items (default)
h5md input.h5 --sampling first

# Sample uniformly across dataset
h5md input.h5 --sampling uniform

# Show first and last items (useful for ranges)
h5md input.h5 --sampling edges

Combined options:

h5md data.h5 -o output.md --max-rows 20 --max-cols 10 --sampling edges

Python API

from h5md import HDF5Converter

# Basic conversion with defaults
converter = HDF5Converter()
markdown_content = converter.convert('input.h5', 'output.md')

# Advanced: customize subsetting and sampling
converter = HDF5Converter(
    max_rows=20,           # Limit to 20 rows per dataset
    max_cols=15,           # Limit to 15 columns per dataset
    sampling_strategy="edges",  # Show first and last items
    include_data_preview=True   # Include actual data values
)
markdown_content = converter.convert('data.h5', 'output.md')

# Metadata only (no data values)
converter = HDF5Converter(include_data_preview=False)
markdown_content = converter.convert('data.h5', 'metadata.md')

Output Format

The generated markdown uses an AI-friendly key-value structure that includes:

  1. File-level attributes - Metadata about the HDF5 file
  2. Group hierarchy - Nested structure with group attributes
  3. Dataset properties - Shape, data type, size, compression, chunks
  4. Dataset attributes - Custom metadata for each dataset
  5. Data preview - Actual data values in key-value format (configurable)
  6. External links - Target file and path information

Sample Key-Value Markdown Output

# HDF5 File Structure: example.h5

## Attributes

- **title:** `Sample Scientific Dataset` (type: `str`)
- **version:** `1.0` (type: `str`)

## Group: /measurements

### Attributes

- **description:** `Experimental measurements` (type: `str`)

### Dataset: temperature

#### Properties

- **Shape:** `(100,)`
- **Data Type:** `float64`
- **Size:** `100` elements

**Data (Key-Value Format):**

- `index_0`: `22.935992117831265`
- `index_1`: `23.308188819527796`
- `index_2`: `20.582239974390227`
- `index_3`: `20.184652272470018`
- `index_4`: `23.397532910900622`
- *(showing 5 of 100 rows using 'first' sampling)*

#### Attributes

- **sensor:** `TH-100` (type: `str`)
- **unit:** `Celsius` (type: `str`)

### Dataset: correlation_matrix

#### Properties

- **Shape:** `(50, 20)`
- **Data Type:** `float64`
- **Size:** `1000` elements

**Data (Key-Value Format):**

- **Row 0:**
  - `col_0`: `0.175408510335`
  - `col_1`: `0.367993360963`
  - `col_2`: `0.361122287567`
- **Row 1:**
  - `col_0`: `0.504039513844`
  - `col_1`: `0.817406445579`
  - `col_2`: `0.900514954273`
- *(showing 2 of 50 rows, 3 of 20 cols using 'first' sampling)*

#### Attributes

- **description:** `Correlation coefficients` (type: `str`)

This format is designed to be:

  • Parseable - Clear structure for AI to extract information
  • Readable - Easy for humans to understand
  • Scalable - Smart subsetting prevents overwhelming output from large datasets

Requirements

  • Python 3.10+
  • h5py
  • numpy

License

BSD 3-Clause License

Copyright (c) 2025, Joe Lee All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5md-0.1.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h5md-0.1.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file h5md-0.1.1.tar.gz.

File metadata

  • Download URL: h5md-0.1.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for h5md-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d4b74f3966564696a2938d6b344243058b22ac6cea57c0d657e59256f9136e0d
MD5 2aa1f115ce3f7eafd4243d57d5bd80c6
BLAKE2b-256 d1d3e62b2db5f33f160d8ed442ec6d5099427c4c4176f17332867e2ae0995e1a

See more details on using hashes here.

File details

Details for the file h5md-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: h5md-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for h5md-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 98ca90fea1e3413ec1b35acc4a97377e84b64d049bcdeb84d5739bb3694ed397
MD5 2550af85f1ae7756d0284a0692ec318e
BLAKE2b-256 dd36a03c28f5d2a145eea9d398e8d421fde9826225b195b395b49097ad8c61d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page