A package for HDF5-based chunked arrays

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language

Project description

A minimal package for saving and reading large HDF5-based chunked arrays.

This package has been developed in the Portugues lab for volumetric calcium imaging data. split_dataset is extensively used in the calcium imaging analysis package fimpy; The microscope control libraries sashimi and brunoise save files as split datasets.

napari-split-dataset support the visualization of SplitDatasets in napari.

Why using Split dataset?

Split datasets are numpy-like array saved over multiple h5 files. The concept of spli datasets is not different from e.g. zarr arrays; however, relying on h5 files allow for partial reading even within the same file, which is crucial for visualizing volumetric time series, the main application split_dataset has been developed for (see this discussion on the limitation of zarr arrays).

Structure of a split dataset

A split dataset is contained in a folder containing multiple, numbered h5 files (one file per chunk) and a metadata json file with information on the shape of the full dataset and of its chunks. The h5 files are saved using the flammkuchen library (ex deepdish). Each file contains a dictionary with the data under the stack keyword.

SplitDataset objects can than be instantiated from the dataset path, and numpy-style indexing can then be used to load data as numpy arrays. Any n of dimensions and block sizes are supported in principle; the package has been used mainly with 3D and 4D arrays.

Minimal example

# Load a  SplitDataset via a SplitDataset object:
from split_dataset import SplitDataset
ds = SplitDataset(path_to_dataset)

# Retrieve data in an interval:
data_array = ds[n_start:n_end, :, :, :]

Creating split datasets

New split datasets can be created with the split_dataset.save_to_split_dataset function, provided that the original data is fully loaded in memory. Alternatively, e.g. for time acquisitions, a split dataset can be saved one chunk at a time. It is enough to save with flammkuchen correctly formatted .h5 files and the correspondent json metadata file describing the full split dataset shape (this is what happens in sashimi)

TODO

provide utilities for partial saving of split datasets
support for more advanced indexing (support for step and vector indexing)
support for cropping a SplitDataset
support for resolution and frequency metadata

History

0.4.0 (2021-03-23)

Added support to use a SplitDataset as data in a napari layer.

...

0.1.0 (2020-05-06)

First release on PyPI.

Credits

Part of this package was inspired by Cookiecutter and this template.

.. _Portugues lab: .. _Cookiecutter: .. _this:

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.4.4

Apr 12, 2022

0.4.2

Mar 30, 2021

0.4.1

Mar 23, 2021

0.3.0

Jul 10, 2020

0.2.2

May 19, 2020

0.2.1

May 6, 2020

0.2.0

May 6, 2020

0.1.2

May 6, 2020

0.1.1

May 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

split_dataset-0.4.4.tar.gz (16.4 kB view hashes)

Uploaded Apr 12, 2022 Source

Built Distribution

split_dataset-0.4.4-py2.py3-none-any.whl (12.4 kB view hashes)

Uploaded Apr 12, 2022 Python 2 Python 3

Hashes for split_dataset-0.4.4.tar.gz

Hashes for split_dataset-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`d08bf4c9580b7b18ff64efa930d71e1a48302d54b5afa2a179dea26f781d5063`
MD5	`fb596e5b8fa872def061e74ffbec8ea7`
BLAKE2b-256	`041867f730c08a13537870e16f949df6c12307bc10b1574af2986a4b8d42335f`

Hashes for split_dataset-0.4.4-py2.py3-none-any.whl

Hashes for split_dataset-0.4.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4412b6e9c01b1171f2aab89b7bab2ef0f4c5cc4361f8c9ca90d57b51665ca10`
MD5	`cbef6951e910ecf7d9b3d97e0c163d0e`
BLAKE2b-256	`dd28382d5ff11bf7be869c7e4031595ec199d3f78f61da345f7e997923966e07`