Skip to main content

Building HDF datasets for machine learning.

Project description

bci-dataset

Python library for organizing multiple EEG datasets using HDF.
Support EEGLAB Data!

*For do deep learning, this library was created as a tool to combine datasets for the major BCI paradigms.

Installation

pip install bci-dataset

How to Use

Add EEG Data

Supported Formats

  • EEGLAB(.set)
    • Epoching (epoch splitting) on EEGLAB is required.
  • numpy(ndarray)

Commonality

import bci_dataset

fpath = "./dataset.hdf"
fs = 500 # sampling rate
updater = DatasetUpdater(fpath,fs=fs)
updater.remove_hdf() # delete hdf file that already exist

Add EEGLAB Data

import numpy as np

labels = ["left","right"]
eeglab_list = ["./sample.set"] # path list of eeglab files

# add eeglab(.set) files
for fp in eeglab_list:
    updater.add_eeglab(fp,labels)

Add NumPy Data

#dummy
dummy_data = np.ones((12,6000)) # channel × signal
dummy_indexes = [0,1000,2000,3000,4000,5000] #Index of trial start
dummy_labels = ["left","right"]*3 #Label of trials
dummy_size = 500 #Size of 1 trial

updater.add_numpy(dummy_data,dummy_indexes,dummy_labels,dummy_size)

Apply Preprocessing

If the "preprocess" method is executed again with the same group name, the already created group with the specified name is deleted once before preprocessing.

"""
preprocessing example
bx : ch × samples
"""
def prepro_func(bx:np.ndarray): 
    x = bx[12:15,:]
    return StandardScaler().fit_transform(x.T).T
updater.preprocess("custom",prepro_func)

Contents of HDF

Note that "dataset" in the figure below refers to the HDF dataset (class).

hdf file
├ origin : group / raw data
│ ├ 1 : dataset
│ ├ 2 : dataset
│ ├ 3 : dataset
│ ├ 4 : dataset
│ ├ 5 : dataset
│ └ …
└ prepro : group / data after preprocessing
  ├ custom : group / "custom" is any group name
  │ ├ 1 : dataset
  │ ├ 2 : dataset
  │ ├ 3 : dataset
  │ ├ 4 : dataset
  │ ├ 5 : dataset
  │ └ …
  └ custom2 : group
    └ ...omit (1,2,3,4,…)
  • Check the contents with software such as HDFView.
  • Use "h5py" or similar to read the HDF file.
    import h5py
    with h5py.File(fpath) as h5:
        fs = h5["prepro/custom"].attrs["fs"]
        dataset_size = h5["prepro/custom"].attrs["count"]
        dataset79 = h5["prepro/custom/79"][()] #ch × samples
        dataset79_label = h5["prepro/custom/79"].attrs["label"]
    

Merge Dataset

In order to merge, "dataset_name" must be set.
If the order of channels is different for each dataset, the order can be aligned by specifying ch_indexes.

Source's preprocessing group is not inherited. In other words, preprocess() must be executed after the merge.

Example: Merge source1 and source2 datasets

    target = DatasetUpdater("new_dataset.h5",fs=fs)
    target.remove_hdf() # reset hdf
    s1 = DatasetUpdater("source1.h5",fs=fs,dataset_name="source1")
    s2 = DatasetUpdater("source2.h5",fs=fs,dataset_name="source2")
    s1_ch_indexes = [1,60,10,5]# channel indexes to use
    target.merge_hdf(s1,ch_indexes=s1_ch_indexes)
    target.merge_hdf(s2)

Pull requests / Issues

If you need anything...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bci-dataset-1.0.0.tar.gz (6.1 kB view hashes)

Uploaded Source

Built Distribution

bci_dataset-1.0.0-py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page