Skip to main content

Automatic type mapping for Python objects to HDF5

Project description

h5typer

Automatic type mapping for Python objects to HDF5

h5typer is a lightweight Python package that provides seamless type conversion when saving and loading nested Python dictionaries to/from HDF5 files. It automatically handles the complexity of mapping Python types (numpy arrays, pandas DataFrames/Series, None, etc.) to HDF5-compatible formats.

Originally developed as part of the HiC-SCA package for Hi-C analysis, h5typer has been extracted as a standalone, reusable package that can be used in any Python project requiring HDF5 I/O with automatic type handling.

Features

  • Automatic Type Conversion: Transparently handles numpy arrays, pandas DataFrames/Series, strings, None, and standard Python types
  • Nested Dictionary Support: Save and load arbitrarily nested dictionary structures
  • Robust: Handles edge cases like empty arrays, None values, and object dtypes
  • Simple API: Just two main functions - save_data() and load_data()
  • Efficient: Uses HDF5's hierarchical storage for fast I/O

Why h5typer?

Standard HDF5 libraries like h5py require manual handling of type conversions:

# Without h5typer - manual type handling
import h5py

with h5py.File('data.h5', 'w') as f:
    f.create_dataset('array', data=np.array([1, 2, 3]))
    f.create_dataset('string', data='hello'.encode('utf-8'))  # Must encode
    # Complex types like pandas DataFrames require custom serialization

# With h5typer - automatic!
import h5typer

data = {
    'array': np.array([1, 2, 3]),
    'string': 'hello',
    'dataframe': pd.DataFrame({'A': [1, 2]})
}
h5typer.save_data('data.h5', data)

h5typer handles all the boilerplate, letting you focus on your data.

Installation

# Install from source
git clone https://github.com/iQLS-MMS/h5typer.git
cd h5typer
pip install .


# Install from PyPI
pip install h5typer

Requirements

  • Python >= 3.10
  • numpy >= 1.19.0
  • pandas >= 1.0.0
  • h5py >= 3.0.0

Quick Start

import h5typer
import numpy as np
import pandas as pd

# Create some data
data = {
    'experiment': {
        'results': np.array([1, 2, 3, 4, 5]),
        'metadata': {
            'name': 'Test Experiment',
            'date': '2025-10-27',
            'valid': True
        },
        'dataframe': pd.DataFrame({
            'A': [1, 2, 3],
            'B': [4, 5, 6]
        }),
        'empty_value': None
    }
}

# Save to HDF5
h5typer.save_data('output.h5', data)

# Load from HDF5
loaded_data = h5typer.load_data('output.h5')

# Access your data - it's exactly as you saved it!
print(loaded_data['experiment']['results'])  # array([1, 2, 3, 4, 5])
print(loaded_data['experiment']['metadata']['name'])  # 'Test Experiment'
print(loaded_data['experiment']['dataframe'])  # Original DataFrame

API Reference

Functions

save_data(filename, data_dict, update=False)

Save a nested dictionary to an HDF5 file.

Parameters:

  • filename (str): Path to the HDF5 file
  • data_dict (dict): Nested dictionary to save
  • update (bool, optional): If True, update existing file; if False, overwrite. Default: False

Example:

import h5typer

data = {'key': 'value', 'array': np.array([1, 2, 3])}
h5typer.save_data('mydata.h5', data)

# Update existing file
more_data = {'new_key': 'new_value'}
h5typer.save_data('mydata.h5', more_data, update=True)

load_data(filename)

Load a nested dictionary from an HDF5 file.

Parameters:

  • filename (str): Path to the HDF5 file

Returns:

  • dict: The loaded nested dictionary

Example:

import h5typer

data = h5typer.load_data('mydata.h5')
print(data['key'])  # 'value'

Class API

For more control, use the H5Typer class directly:

from h5typer import H5Typer

# Create instance
io_handler = H5Typer()

# Save data
io_handler.save_data('output.h5', my_dict)

# Load data
loaded = io_handler.load_data('output.h5')

Supported Types

h5typer automatically handles the following Python types:

Python Type HDF5 Storage Notes
numpy.ndarray Dataset All dtypes supported, excluding object arrays
pandas.DataFrame Group with datasets Index, columns, and values preserved
pandas.Series Group with datasets Index and values preserved
dict Group Nested dictionaries become HDF5 groups
str Dataset (bytes) UTF-8 encoded
int, float Dataset Stored as numpy scalars
None h5py.Empty Preserved on load
list, tuple Dataset Converted to numpy arrays, all elements must be of the same type

Type Conversion Details

NumPy Arrays

# String arrays
str_array = np.array(['a', 'b', 'c'])
# Automatically converted to bytes for HDF5

# Float64 arrays
float_array = np.array([1.0, 5.0, 3.0], dtype=np.float64)

Pandas Objects

# DataFrames - index, columns, and values all preserved
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Series - index and values preserved
series = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

None Values

data = {
    'value': None,
}
# None values are preserved through save/load

Advanced Usage

Nested Structures

data = {
    'level1': {
        'level2': {
            'level3': {
                'deep_array': np.array([1, 2, 3])
            }
        }
    }
}

h5typer.save_data('nested.h5', data)
loaded = h5typer.load_data('nested.h5')
# Structure is preserved

Updating Files

# Initial save
h5typer.save_data('data.h5', {'key1': 'value1'})

# Add more data
h5typer.save_data('data.h5', {'key2': 'value2'}, update=True)

# Load gets both keys
data = h5typer.load_data('data.h5')
# {'key1': 'value1', 'key2': 'value2'}

Integer Keys

# Integer keys are preserved
data = {
    100: 'hundred',
    200: 'two hundred',
    'string_key': 'value'
}

h5typer.save_data('intkeys.h5', data)
loaded = h5typer.load_data('intkeys.h5')
print(loaded[100])  # 'hundred' - key is still an integer

Integration with HiC-SCA

h5typer is used by HiC-SCA for all HDF5 I/O operations:

# In HiC-SCA
from hicsca import HiCSCA

# Process Hi-C data
hicsca = HiCSCA("sample.hic", resolutions=[100000])
hicsca.process_all_chromosomes()

# Save results (uses h5typer internally)
hicsca.to_hdf5("results.h5")

# Load results (uses h5typer internally)
hicsca_loaded = HiCSCA.from_hdf5("results.h5")

The integration is transparent - HiC-SCA uses h5typer for automatic type mapping of complex nested dictionaries containing numpy arrays, pandas DataFrames, and metadata.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5typer-0.2.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h5typer-0.2.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file h5typer-0.2.0.tar.gz.

File metadata

  • Download URL: h5typer-0.2.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h5typer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 642f1777a799b74023d696d8402753b2a708dbda019f777be6e01dab6af9fad3
MD5 b3081d8d2c6def29556add155ce376eb
BLAKE2b-256 30b4fa2f6143b4d79722720423600bfcec01e7633f7c603cadb12275288a22e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5typer-0.2.0.tar.gz:

Publisher: pypi-publish.yml on iQLS-MMS/h5typer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file h5typer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: h5typer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h5typer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3835a95177bab7c060dfa485021397e624e646b57a8f3c6cc0e295117ed088eb
MD5 03705b26e9cea6c9e70e4ac2d7ff5936
BLAKE2b-256 342903d81937c825ff1ac3866003b7380b4e26e6b7a37f300f152021fc10c401

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5typer-0.2.0-py3-none-any.whl:

Publisher: pypi-publish.yml on iQLS-MMS/h5typer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page