Automatic type mapping for Python objects to HDF5
Project description
h5typer
Automatic type mapping for Python objects to HDF5
h5typer is a lightweight Python package that provides seamless type conversion when saving and loading nested Python dictionaries to/from HDF5 files. It automatically handles the complexity of mapping Python types (numpy arrays, pandas DataFrames/Series, None, etc.) to HDF5-compatible formats.
Originally developed as part of the HiC-SCA package for Hi-C analysis, h5typer has been extracted as a standalone, reusable package that can be used in any Python project requiring HDF5 I/O with automatic type handling.
Features
- Automatic Type Conversion: Transparently handles numpy arrays, pandas DataFrames/Series, strings, None, and standard Python types
- Nested Dictionary Support: Save and load arbitrarily nested dictionary structures
- Robust: Handles edge cases like empty arrays, None values, and object dtypes
- Simple API: Just two main functions -
save_data()andload_data() - Efficient: Uses HDF5's hierarchical storage for fast I/O
Why h5typer?
Standard HDF5 libraries like h5py require manual handling of type conversions:
# Without h5typer - manual type handling
import h5py
with h5py.File('data.h5', 'w') as f:
f.create_dataset('array', data=np.array([1, 2, 3]))
f.create_dataset('string', data='hello'.encode('utf-8')) # Must encode
# Complex types like pandas DataFrames require custom serialization
# With h5typer - automatic!
import h5typer
data = {
'array': np.array([1, 2, 3]),
'string': 'hello',
'dataframe': pd.DataFrame({'A': [1, 2]})
}
h5typer.save_data('data.h5', data)
h5typer handles all the boilerplate, letting you focus on your data.
Installation
# Install from source
git clone https://github.com/iQLS-MMS/h5typer.git
cd h5typer
pip install .
# Install from PyPI
pip install h5typer
Requirements
- Python >= 3.10
- numpy >= 1.19.0
- pandas >= 1.0.0
- h5py >= 3.0.0
Quick Start
import h5typer
import numpy as np
import pandas as pd
# Create some data
data = {
'experiment': {
'results': np.array([1, 2, 3, 4, 5]),
'metadata': {
'name': 'Test Experiment',
'date': '2025-10-27',
'valid': True
},
'dataframe': pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
}),
'empty_value': None
}
}
# Save to HDF5
h5typer.save_data('output.h5', data)
# Load from HDF5
loaded_data = h5typer.load_data('output.h5')
# Access your data - it's exactly as you saved it!
print(loaded_data['experiment']['results']) # array([1, 2, 3, 4, 5])
print(loaded_data['experiment']['metadata']['name']) # 'Test Experiment'
print(loaded_data['experiment']['dataframe']) # Original DataFrame
API Reference
Functions
save_data(filename, data_dict, update=False)
Save a nested dictionary to an HDF5 file.
Parameters:
filename(str): Path to the HDF5 filedata_dict(dict): Nested dictionary to saveupdate(bool, optional): If True, update existing file; if False, overwrite. Default: False
Example:
import h5typer
data = {'key': 'value', 'array': np.array([1, 2, 3])}
h5typer.save_data('mydata.h5', data)
# Update existing file
more_data = {'new_key': 'new_value'}
h5typer.save_data('mydata.h5', more_data, update=True)
load_data(filename)
Load a nested dictionary from an HDF5 file.
Parameters:
filename(str): Path to the HDF5 file
Returns:
dict: The loaded nested dictionary
Example:
import h5typer
data = h5typer.load_data('mydata.h5')
print(data['key']) # 'value'
Class API
For more control, use the H5Typer class directly:
from h5typer import H5Typer
# Create instance
io_handler = H5Typer()
# Save data
io_handler.save_data('output.h5', my_dict)
# Load data
loaded = io_handler.load_data('output.h5')
Supported Types
h5typer automatically handles the following Python types:
| Python Type | HDF5 Storage | Notes |
|---|---|---|
numpy.ndarray |
Dataset | All dtypes supported, excluding object arrays |
pandas.DataFrame |
Group with datasets | Index, columns, and values preserved |
pandas.Series |
Group with datasets | Index and values preserved |
dict |
Group | Nested dictionaries become HDF5 groups |
str |
Dataset (bytes) | UTF-8 encoded |
int, float |
Dataset | Stored as numpy scalars |
None |
h5py.Empty | Preserved on load |
list, tuple |
Dataset | Converted to numpy arrays, all elements must be of the same type |
Type Conversion Details
NumPy Arrays
# String arrays
str_array = np.array(['a', 'b', 'c'])
# Automatically converted to bytes for HDF5
# Float64 arrays
float_array = np.array([1.0, 5.0, 3.0], dtype=np.float64)
Pandas Objects
# DataFrames - index, columns, and values all preserved
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
# Series - index and values preserved
series = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
None Values
data = {
'value': None,
}
# None values are preserved through save/load
Advanced Usage
Nested Structures
data = {
'level1': {
'level2': {
'level3': {
'deep_array': np.array([1, 2, 3])
}
}
}
}
h5typer.save_data('nested.h5', data)
loaded = h5typer.load_data('nested.h5')
# Structure is preserved
Updating Files
# Initial save
h5typer.save_data('data.h5', {'key1': 'value1'})
# Add more data
h5typer.save_data('data.h5', {'key2': 'value2'}, update=True)
# Load gets both keys
data = h5typer.load_data('data.h5')
# {'key1': 'value1', 'key2': 'value2'}
Integer Keys
# Integer keys are preserved
data = {
100: 'hundred',
200: 'two hundred',
'string_key': 'value'
}
h5typer.save_data('intkeys.h5', data)
loaded = h5typer.load_data('intkeys.h5')
print(loaded[100]) # 'hundred' - key is still an integer
Integration with HiC-SCA
h5typer is used by HiC-SCA for all HDF5 I/O operations:
# In HiC-SCA
from hicsca import HiCSCA
# Process Hi-C data
hicsca = HiCSCA("sample.hic", resolutions=[100000])
hicsca.process_all_chromosomes()
# Save results (uses h5typer internally)
hicsca.to_hdf5("results.h5")
# Load results (uses h5typer internally)
hicsca_loaded = HiCSCA.from_hdf5("results.h5")
The integration is transparent - HiC-SCA uses h5typer for automatic type mapping of complex nested dictionaries containing numpy arrays, pandas DataFrames, and metadata.
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file h5typer-0.2.0.tar.gz.
File metadata
- Download URL: h5typer-0.2.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
642f1777a799b74023d696d8402753b2a708dbda019f777be6e01dab6af9fad3
|
|
| MD5 |
b3081d8d2c6def29556add155ce376eb
|
|
| BLAKE2b-256 |
30b4fa2f6143b4d79722720423600bfcec01e7633f7c603cadb12275288a22e4
|
Provenance
The following attestation bundles were made for h5typer-0.2.0.tar.gz:
Publisher:
pypi-publish.yml on iQLS-MMS/h5typer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h5typer-0.2.0.tar.gz -
Subject digest:
642f1777a799b74023d696d8402753b2a708dbda019f777be6e01dab6af9fad3 - Sigstore transparency entry: 707173285
- Sigstore integration time:
-
Permalink:
iQLS-MMS/h5typer@7ab2f89dc9c7d11be2add4e6ef61661c66120f47 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/iQLS-MMS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@7ab2f89dc9c7d11be2add4e6ef61661c66120f47 -
Trigger Event:
release
-
Statement type:
File details
Details for the file h5typer-0.2.0-py3-none-any.whl.
File metadata
- Download URL: h5typer-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3835a95177bab7c060dfa485021397e624e646b57a8f3c6cc0e295117ed088eb
|
|
| MD5 |
03705b26e9cea6c9e70e4ac2d7ff5936
|
|
| BLAKE2b-256 |
342903d81937c825ff1ac3866003b7380b4e26e6b7a37f300f152021fc10c401
|
Provenance
The following attestation bundles were made for h5typer-0.2.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on iQLS-MMS/h5typer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h5typer-0.2.0-py3-none-any.whl -
Subject digest:
3835a95177bab7c060dfa485021397e624e646b57a8f3c6cc0e295117ed088eb - Sigstore transparency entry: 707173287
- Sigstore integration time:
-
Permalink:
iQLS-MMS/h5typer@7ab2f89dc9c7d11be2add4e6ef61661c66120f47 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/iQLS-MMS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@7ab2f89dc9c7d11be2add4e6ef61661c66120f47 -
Trigger Event:
release
-
Statement type: