Skip to main content

Compute xxHash digest of hdf5 datasets

Project description

hdf5-xxh

Compute a xxHash of the Datasets in an HDF5® file.

Motivation

For regression testing, it is sometimes useful to check for the strict equality of numerical data stored in an HDF5 file. A reference copy of the HDF5 file could be saved, but this is not always desirable, especially if the file is very large. Computing a hash digest of the HDF5 file itself is not possible because, for various reasons, HDF5 files are not byte-for-byte identical, even if the stored data is the same. This small utility computes a hash digest of the datasets stored in the HDF5 files, thereby enabling an easy check for strict equality without the need to store a complete copy of the data itself.

Installation

The package is available on pypi:

pip install h5xxhsum

Usage

$ h5xxhsum foo.h5
e92417e2e9a3425cffbe35fddc5f21a3  foo.h5

Chunked storage

HDF5 supports chunked storage. This utility implements a flag for controlling how the hash digest is computed:

  • --no-chunked: the whole dataset is loaded in memory and hashed
  • --chunked: the hash is computed incrementally, loading a chunk at a time, with the iter_chunks() method.

--chunked is faster, but the hash digest depends not only on the data itself but also on the chunk size/layout. On the contrary --no-chunked is slower but idependent on the storage layout.

Caveat emptor

I wrote this small utility for personal use, so there is no guarantee that the API will remain stable. However, I believe it fills a small but useful niche. Please feel free to open an issue if you think it can be improved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5xxhsum-0.1.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h5xxhsum-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file h5xxhsum-0.1.0.tar.gz.

File metadata

  • Download URL: h5xxhsum-0.1.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for h5xxhsum-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d92b5969fe9d5cebb9a4b84e5ecba1a089186a4035b6b6d48aa0f56d96568cc1
MD5 143434e82b26e49c657cd330c8cc7a6e
BLAKE2b-256 cf0a23efed7d111d7e06b0f6e8a5f3016d03cceb02a4ca5058f5187acace95fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5xxhsum-0.1.0.tar.gz:

Publisher: python-publish.yml on miccoli/hdf5-xxh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file h5xxhsum-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: h5xxhsum-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for h5xxhsum-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee1a10297a4fa356dbabf402b994c3b0060a85ca3c09eb7a5550254b64193c60
MD5 75b0a24f47c5d9d3ae7be46c2e21ebc0
BLAKE2b-256 cfd314448b36c7865f253bd045f4b0fee2083ef104cc533e047907bdeabe4384

See more details on using hashes here.

Provenance

The following attestation bundles were made for h5xxhsum-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on miccoli/hdf5-xxh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page