Compute xxHash digest of hdf5 datasets
Project description
hdf5-xxh
Compute a xxHash of the Datasets in an HDF5® file.
Motivation
For regression testing, it is sometimes useful to check for the strict equality of numerical data stored in an HDF5 file. A reference copy of the HDF5 file could be saved, but this is not always desirable, especially if the file is very large. Computing a hash digest of the HDF5 file itself is not possible because, for various reasons, HDF5 files are not byte-for-byte identical, even if the stored data is the same. This small utility computes a hash digest of the datasets stored in the HDF5 files, thereby enabling an easy check for strict equality without the need to store a complete copy of the data itself.
Installation
The package is available on pypi:
pip install h5xxhsum
Usage
$ h5xxhsum foo.h5
e92417e2e9a3425cffbe35fddc5f21a3 foo.h5
Chunked storage
HDF5 supports chunked storage. This utility implements a flag for controlling how the hash digest is computed:
--no-chunked: the whole dataset is loaded in memory and hashed--chunked: the hash is computed incrementally, loading a chunk at a time, with theiter_chunks()method.
--chunked is faster, but the hash digest depends not only on the data itself but also on the chunk size/layout.
On the contrary --no-chunked is slower but idependent on the storage layout.
Caveat emptor
I wrote this small utility for personal use, so there is no guarantee that the API will remain stable. However, I believe it fills a small but useful niche. Please feel free to open an issue if you think it can be improved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file h5xxhsum-0.1.0.tar.gz.
File metadata
- Download URL: h5xxhsum-0.1.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d92b5969fe9d5cebb9a4b84e5ecba1a089186a4035b6b6d48aa0f56d96568cc1
|
|
| MD5 |
143434e82b26e49c657cd330c8cc7a6e
|
|
| BLAKE2b-256 |
cf0a23efed7d111d7e06b0f6e8a5f3016d03cceb02a4ca5058f5187acace95fa
|
Provenance
The following attestation bundles were made for h5xxhsum-0.1.0.tar.gz:
Publisher:
python-publish.yml on miccoli/hdf5-xxh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h5xxhsum-0.1.0.tar.gz -
Subject digest:
d92b5969fe9d5cebb9a4b84e5ecba1a089186a4035b6b6d48aa0f56d96568cc1 - Sigstore transparency entry: 186666849
- Sigstore integration time:
-
Permalink:
miccoli/hdf5-xxh@75f654f68f7d40824d0c326b47ae7e948d556d82 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/miccoli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@75f654f68f7d40824d0c326b47ae7e948d556d82 -
Trigger Event:
release
-
Statement type:
File details
Details for the file h5xxhsum-0.1.0-py3-none-any.whl.
File metadata
- Download URL: h5xxhsum-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee1a10297a4fa356dbabf402b994c3b0060a85ca3c09eb7a5550254b64193c60
|
|
| MD5 |
75b0a24f47c5d9d3ae7be46c2e21ebc0
|
|
| BLAKE2b-256 |
cfd314448b36c7865f253bd045f4b0fee2083ef104cc533e047907bdeabe4384
|
Provenance
The following attestation bundles were made for h5xxhsum-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on miccoli/hdf5-xxh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
h5xxhsum-0.1.0-py3-none-any.whl -
Subject digest:
ee1a10297a4fa356dbabf402b994c3b0060a85ca3c09eb7a5550254b64193c60 - Sigstore transparency entry: 186666851
- Sigstore integration time:
-
Permalink:
miccoli/hdf5-xxh@75f654f68f7d40824d0c326b47ae7e948d556d82 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/miccoli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@75f654f68f7d40824d0c326b47ae7e948d556d82 -
Trigger Event:
release
-
Statement type: