Skip to main content

Keras-style data iterator for images contained in dataset files such as hdf5 or PIL readable files. Images can be contained in several files.

Project description

Dataset Iterator

This repo contains keras iterator classes for multi-channel (time-lapse) images contained in dataset files such as hdf5.

Dataset structure:

One dataset file can contain several sub-datasets (dataset_name0, dataset_name1, etc...), the iterator will iterate through all of them as if they were concatenated.

.
├── ...
├── dataset_name0                    
│   ├── channel0          
│   └── channel1   
│   └── ...
├── dataset_name1                    
│   ├── channel0          
│   └── channel1   
│   └── ...
└── ...

Each dataset contain channels (channel0, channel1 ...) that must have same shape. All datasets must have the same number of channels, and shape (except batch size) must be equal among datasets.

Groups

There can be more folder level, for instance to have train and test sets in the same file:

.
├── ...
├── experiment1                    
│   ├── train          
│   │   ├── raw
│   │   └── labels
│   └── test   
│       ├── raw
│       └── labels
├── experiment2                    
│   ├── train          
│   │   ├── raw
│   │   └── labels
│   └── test   
│       ├── raw
│       └── labels
└── ...
train_it = MultiChannelIterator(dataset_file_path = file_path, channel_keywords = ["/raw", "/labels"], group_keyword="train")
test_it = MultiChannelIterator(dataset_file_path = file_path, channel_keywords = ["/raw", "/labels"], group_keyword="test")

Image formats

  • Those iterators are using an object of class DatasetIO to access the data.
  • There is currently an implementation of DatasetIO for .h5 files (H5pyIO), as well as dataset composed of multiple images files supported by PILLOW (MultipleFileIO).
  • one can also concatenate datasets from different files:
    • if a dataset is split into several files that contain the same channels: use ConcatenateDatasetIO
    • if a dataset contains channels in different files, use: MultipleDatasetIO

Demo

See this notebook for a demo:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_iterator-0.4.0.tar.gz (50.3 kB view hashes)

Uploaded Source

Built Distribution

dataset_iterator-0.4.0-py3-none-any.whl (57.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page