Automatic type mapping for Python objects to HDF5

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

capslockwizard

These details have not been verified by PyPI

Project description

h5typer

Automatic type mapping for Python objects to HDF5

h5typer is a lightweight Python package that provides seamless type conversion when saving and loading nested Python dictionaries to/from HDF5 files. It automatically handles the complexity of mapping Python types (numpy arrays, pandas DataFrames/Series, None, etc.) to HDF5-compatible formats.

Originally developed as part of the HiC-SCA package for Hi-C analysis, h5typer has been extracted as a standalone, reusable package that can be used in any Python project requiring HDF5 I/O with automatic type handling.

Features

Automatic Type Conversion: Transparently handles numpy arrays, pandas DataFrames/Series, strings, None, and standard Python types
Nested Dictionary Support: Save and load arbitrarily nested dictionary structures
Robust: Handles edge cases like empty arrays, None values, and object dtypes
Simple API: Just two main functions - save_data() and load_data()
Efficient: Uses HDF5's hierarchical storage for fast I/O

Why h5typer?

Standard HDF5 libraries like h5py require manual handling of type conversions:

# Without h5typer - manual type handling
import h5py

with h5py.File('data.h5', 'w') as f:
    f.create_dataset('array', data=np.array([1, 2, 3]))
    f.create_dataset('string', data='hello'.encode('utf-8'))  # Must encode
    # Complex types like pandas DataFrames require custom serialization

# With h5typer - automatic!
import h5typer

data = {
    'array': np.array([1, 2, 3]),
    'string': 'hello',
    'dataframe': pd.DataFrame({'A': [1, 2]})
}
h5typer.save_data('data.h5', data)

h5typer handles all the boilerplate, letting you focus on your data.

Installation

# Install from source
git clone https://github.com/iQLS-MMS/h5typer.git
cd h5typer
pip install .

Requirements

Python >= 3.7
numpy >= 1.19.0
pandas >= 1.0.0
h5py >= 3.0.0

Quick Start

import h5typer
import numpy as np
import pandas as pd

# Create some data
data = {
    'experiment': {
        'results': np.array([1, 2, 3, 4, 5]),
        'metadata': {
            'name': 'Test Experiment',
            'date': '2025-10-27',
            'valid': True
        },
        'dataframe': pd.DataFrame({
            'A': [1, 2, 3],
            'B': [4, 5, 6]
        }),
        'empty_value': None
    }
}

# Save to HDF5
h5typer.save_data('output.h5', data)

# Load from HDF5
loaded_data = h5typer.load_data('output.h5')

# Access your data - it's exactly as you saved it!
print(loaded_data['experiment']['results'])  # array([1, 2, 3, 4, 5])
print(loaded_data['experiment']['metadata']['name'])  # 'Test Experiment'
print(loaded_data['experiment']['dataframe'])  # Original DataFrame

API Reference

Functions

`save_data(filename, data_dict, update=False)`

Save a nested dictionary to an HDF5 file.

Parameters:

filename (str): Path to the HDF5 file
data_dict (dict): Nested dictionary to save
update (bool, optional): If True, update existing file; if False, overwrite. Default: False

Example:

import h5typer

data = {'key': 'value', 'array': np.array([1, 2, 3])}
h5typer.save_data('mydata.h5', data)

# Update existing file
more_data = {'new_key': 'new_value'}
h5typer.save_data('mydata.h5', more_data, update=True)

`load_data(filename)`

Load a nested dictionary from an HDF5 file.

Parameters:

filename (str): Path to the HDF5 file

Returns:

dict: The loaded nested dictionary

Example:

import h5typer

data = h5typer.load_data('mydata.h5')
print(data['key'])  # 'value'

Class API

For more control, use the H5Typer class directly:

from h5typer import H5Typer

# Create instance
io_handler = H5Typer()

# Save data
io_handler.save_data('output.h5', my_dict)

# Load data
loaded = io_handler.load_data('output.h5')

Supported Types

h5typer automatically handles the following Python types:

Python Type	HDF5 Storage	Notes
`numpy.ndarray`	Dataset	All dtypes supported, including object arrays
`pandas.DataFrame`	Group with datasets	Index, columns, and values preserved
`pandas.Series`	Group with datasets	Index and values preserved
`dict`	Group	Nested dictionaries become HDF5 groups
`str`	Dataset (bytes)	UTF-8 encoded
`int`, `float`	Dataset	Stored as numpy scalars
`None`	h5py.Empty	Preserved on load
`list`, `tuple`	Dataset	Converted to numpy arrays, all elements must be of the same type

Type Conversion Details

NumPy Arrays

# String arrays
str_array = np.array(['a', 'b', 'c'])
# Automatically converted to bytes for HDF5

# Float64 arrays
float_array = np.array([1.0, 5.0, 3.0], dtype=np.float64)

Pandas Objects

# DataFrames - index, columns, and values all preserved
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Series - index and values preserved
series = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

None Values

data = {
    'value': None,
}
# None values are preserved through save/load

Advanced Usage

Nested Structures

data = {
    'level1': {
        'level2': {
            'level3': {
                'deep_array': np.array([1, 2, 3])
            }
        }
    }
}

h5typer.save_data('nested.h5', data)
loaded = h5typer.load_data('nested.h5')
# Structure is preserved

Updating Files

# Initial save
h5typer.save_data('data.h5', {'key1': 'value1'})

# Add more data
h5typer.save_data('data.h5', {'key2': 'value2'}, update=True)

# Load gets both keys
data = h5typer.load_data('data.h5')
# {'key1': 'value1', 'key2': 'value2'}

Integer Keys

# Integer keys are preserved
data = {
    100: 'hundred',
    200: 'two hundred',
    'string_key': 'value'
}

h5typer.save_data('intkeys.h5', data)
loaded = h5typer.load_data('intkeys.h5')
print(loaded[100])  # 'hundred' - key is still an integer

Integration with HiC-SCA

h5typer is used by HiC-SCA for all HDF5 I/O operations:

# In HiC-SCA
from hicsca import HiCSCA

# Process Hi-C data
hicsca = HiCSCA("sample.hic", resolutions=[100000])
hicsca.process_all_chromosomes()

# Save results (uses h5typer internally)
hicsca.to_hdf5("results.h5")

# Load results (uses h5typer internally)
hicsca_loaded = HiCSCA.from_hdf5("results.h5")

The integration is transparent - HiC-SCA uses h5typer for automatic type mapping of complex nested dictionaries containing numpy arrays, pandas DataFrames, and metadata.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Changelog

Version 0.1.0

Initial release
Support for numpy arrays, pandas DataFrames/Series, None values
Nested dictionary support
Integer key preservation
Update functionality

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

capslockwizard

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Nov 18, 2025

This version

0.2.0b1 pre-release

Nov 10, 2025

0.1.0 yanked

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5typer-0.2.0b1.tar.gz (9.8 kB view details)

Uploaded Nov 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

h5typer-0.2.0b1-py3-none-any.whl (8.9 kB view details)

Uploaded Nov 10, 2025 Python 3

File details

Details for the file h5typer-0.2.0b1.tar.gz.

File metadata

Download URL: h5typer-0.2.0b1.tar.gz
Upload date: Nov 10, 2025
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h5typer-0.2.0b1.tar.gz
Algorithm	Hash digest
SHA256	`dbae91c26afced4e662335a3ab9f00e0b4911af98b6215e39de1a39dea0b2018`
MD5	`49b9627d7e37ac2ffccd796a8245ea76`
BLAKE2b-256	`9a81f66d8d6745d71f3edb11973bfb34941181dddefbbfc6d7fb6c4e703d24b7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5typer-0.2.0b1.tar.gz:

Publisher: pypi-publish.yml on iQLS-MMS/h5typer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: h5typer-0.2.0b1.tar.gz
- Subject digest: dbae91c26afced4e662335a3ab9f00e0b4911af98b6215e39de1a39dea0b2018
- Sigstore transparency entry: 686183140
- Sigstore integration time: Nov 10, 2025
Source repository:
- Permalink: iQLS-MMS/h5typer@a3442bc279588f4d00c58e9255c89f9f0af22eba
- Branch / Tag: refs/tags/v0.2.0b1
- Owner: https://github.com/iQLS-MMS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@a3442bc279588f4d00c58e9255c89f9f0af22eba
- Trigger Event: release

File details

Details for the file h5typer-0.2.0b1-py3-none-any.whl.

File metadata

Download URL: h5typer-0.2.0b1-py3-none-any.whl
Upload date: Nov 10, 2025
Size: 8.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h5typer-0.2.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ec6734705dd8a42cb8cb95cab9614758324933794bc76555bbf085a320c4574`
MD5	`9e931ae658441f15250f026df9916c0b`
BLAKE2b-256	`6409bc31289cc4d2879be8255f6bf2f0a64e77964b97e011d3ae683335fc44f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5typer-0.2.0b1-py3-none-any.whl:

Publisher: pypi-publish.yml on iQLS-MMS/h5typer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: h5typer-0.2.0b1-py3-none-any.whl
- Subject digest: 5ec6734705dd8a42cb8cb95cab9614758324933794bc76555bbf085a320c4574
- Sigstore transparency entry: 686183142
- Sigstore integration time: Nov 10, 2025
Source repository:
- Permalink: iQLS-MMS/h5typer@a3442bc279588f4d00c58e9255c89f9f0af22eba
- Branch / Tag: refs/tags/v0.2.0b1
- Owner: https://github.com/iQLS-MMS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@a3442bc279588f4d00c58e9255c89f9f0af22eba
- Trigger Event: release

h5typer 0.2.0b1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

h5typer

Features

Why h5typer?

Installation

Requirements

Quick Start

API Reference

Functions

save_data(filename, data_dict, update=False)

load_data(filename)

Class API

Supported Types

Type Conversion Details

NumPy Arrays

Pandas Objects

None Values

Advanced Usage

Nested Structures

Updating Files

Integer Keys

Integration with HiC-SCA

License

Contributing

Changelog

Version 0.1.0

See Also

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`save_data(filename, data_dict, update=False)`

`load_data(filename)`