A library to compress ESRF data and reduce their footprint
Project description
ESRF Data Compressor
ESRF Data Compressor is a command-line tool and Python library designed to compress large ESRF HDF5 datasets (3D volumes) and verify data consistency via SSIM. The default compression backend uses Blosc2 + Grok (JPEG2000).
Features
-
Discover raw HDF5 dataset files under an experiment’s
RAW_DATA- Goes through the HDF5 Virtual Datasets to find the data to compress
- Allows to filter down scan by scan based on the value of a key
-
Slice-by-slice compression
- Uses Blosc2 + Grok (JPEG2000) on every slice of each 3D dataset (axis 0)
- User-configurable compression ratio (e.g.
--cratio 10)
-
Parallel execution
-
Automatically factors CPU cores into worker processes × per-process threads
-
By default, each worker runs up to 2 Blosc2 threads (or falls back to 1 thread if < 2 cores)
-
Non-destructive workflow
compresswrites compressed files either:- next to each source as
<basename>_<compression_method>.h5(--layout sibling), or - under a mirrored
RAW_DATA_COMPRESSEDtree using the same source file names, while copying non-compressed folders/files (--layout mirror, default)
- next to each source as
checkcomputes SSIM (first and last frames) and writes a reportoverwrite(optional) swaps out the raw frame file (irreversible)
-
Four simple CLI subcommands
compress-hdf5 listShow all raw HDF5 files to be processedcompress-hdf5 compressGenerate compressed siblingscompress-hdf5 checkProduce a per-dataset SSIM report between raw & compressedcompress-hdf5 overwriteAtomically replace each raw frame file (irreversible)
Installation
From PyPI
pip install esrf-data-compressor
Once installed, the compress-hdf5 command will be available in your PATH.
From Source (for development)
git clone https://gitlab.esrf.fr/dau/esrf-data-compressor.git
cd esrf-data-compressor
# (Optional) Create & activate a virtual environment
python -m venv venv
source venv/bin/activate
# Install build dependencies & the package itself
pip install .
Documentation
Full documentation is available online: ESRF Data Compressor Docs
Contributing & Development
-
Clone the repository:
git clone https://gitlab.esrf.fr/dau/esrf-data-compressor.git cd esrf-data-compressor
-
Install dependencies (in a virtual environment):
python -m venv venv source venv/bin/activate pip install -e "[dev]"
-
Run tests with coverage:
pytest -v --cov=esrf_data_compressor --cov-report=term-missing
-
Style:
black .flake8 .ruff .
-
Build docs (Sphinx + pydata theme):
sphinx-build doc build/html
License
This project is licensed under the MIT License. See LICENSE for full text.
Changelog
All noteworthy changes are recorded in CHANGELOG.md. Version 0.1.0 marks the first public release with:
- Initial implementation of Blosc2 + Grok (JPEG2000) compression for 3D HDF5 datasets.
- SSIM-based integrity check (first & last slice).
- Four-command CLI (
compress-hdf5 list,compress-hdf5 compress,compress-hdf5 check,compress-hdf5 overwrite). - Parallelism with worker×thread auto-factoring.
For more details, see the full history in CHANGELOG.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file esrf_data_compressor-0.2.1.tar.gz.
File metadata
- Download URL: esrf_data_compressor-0.2.1.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08658acec629139e50eaf649ad462e765a1578075cf4d9a6821dc31905c1bd4a
|
|
| MD5 |
2032607736a0a213c1fb4d466e1b0450
|
|
| BLAKE2b-256 |
26c9b76509b4cb76f33b64673f655b414cf6d3e78f3d7dda6cf5faa24222c580
|
File details
Details for the file esrf_data_compressor-0.2.1-py3-none-any.whl.
File metadata
- Download URL: esrf_data_compressor-0.2.1-py3-none-any.whl
- Upload date:
- Size: 29.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dff0fea0d14cb2aef7b4170ed6dfc4f65e6fd343d2677e07caeded2107eae4f1
|
|
| MD5 |
5a6040c0ef88e10bf934e58f94920813
|
|
| BLAKE2b-256 |
4c17d43ce8751307450e0dd617e677b5ca8a1360c32738b43eed469fb07fe30c
|