Skip to main content

Building HDF datasets for machine learning.

Project description

bci-dataset

Python library for organizing multiple EEG datasets using HDF.
Support EEGLAB Data!

*For do deep learning, this library was created as a tool to combine datasets for the major BCI paradigms.

Installation

pip install bci-dataset

How to Use

Add EEG Data

Supported Formats

  • EEGLAB(.set)
    • Epoching (epoch splitting) on EEGLAB is required.
  • numpy(ndarray)

Commonality

import bci_dataset

fpath = "./dataset.hdf"
fs = 500 # sampling rate
updater = DatasetUpdater(fpath,fs=fs)
updater.remove_hdf() # delete hdf file that already exist

Add EEGLAB Data

import numpy as np

labels = ["left","right"]
eeglab_list = ["./sample.set"] # path list of eeglab files

# add eeglab(.set) files
for fp in eeglab_list:
    updater.add_eeglab(fp,labels)

Add NumPy Data

#dummy
dummy_data = np.ones((12,6000)) # channel × signal
dummy_indexes = [0,1000,2000,3000,4000,5000] #Index of trial start
dummy_labels = ["left","right"]*3 #Label of trials
dummy_size = 500 #Size of 1 trial

updater.add_numpy(dummy_data,dummy_indexes,dummy_labels,dummy_size)

Apply Preprocessing

If the "preprocess" method is executed again with the same group name, the already created group with the specified name is deleted once before preprocessing.

"""
preprocessing example
bx : ch × samples
"""
def prepro_func(bx:np.ndarray): 
    x = bx[12:15,:]
    return StandardScaler().fit_transform(x.T).T
updater.preprocess("custom",prepro_func)

Contents of HDF

Note that "dataset" in the figure below refers to the HDF dataset (class).

hdf file
├ origin : group / raw data
│ ├ 1 : dataset
│ ├ 2 : dataset
│ ├ 3 : dataset
│ ├ 4 : dataset
│ ├ 5 : dataset
│ └ …
└ prepro : group / data after preprocessing
  ├ custom : group / "custom" is any group name
  │ ├ 1 : dataset
  │ ├ 2 : dataset
  │ ├ 3 : dataset
  │ ├ 4 : dataset
  │ ├ 5 : dataset
  │ └ …
  └ custom2 : group
    └ ...omit (1,2,3,4,…)
  • Check the contents with software such as HDFView.
  • Use "h5py" or similar to read the HDF file.
    import h5py
    with h5py.File(fpath) as h5:
        fs = h5["prepro/custom"].attrs["fs"]
        dataset_size = h5["prepro/custom"].attrs["count"]
        dataset79 = h5["prepro/custom/79"][()] #ch × samples
        dataset79_label = h5["prepro/custom/79"].attrs["label"]
    

Merge Dataset

In order to merge, "dataset_name" must be set.
If the order of channels is different for each dataset, the order can be aligned by specifying ch_indexes.

Source's preprocessing group is not inherited. In other words, preprocess() must be executed after the merge.

Example: Merge source1 and source2 datasets

    target = DatasetUpdater("new_dataset.h5",fs=fs)
    target.remove_hdf() # reset hdf
    s1 = DatasetUpdater("source1.h5",fs=fs,dataset_name="source1")
    s2 = DatasetUpdater("source2.h5",fs=fs,dataset_name="source2")
    s1_ch_indexes = [1,60,10,5]# channel indexes to use
    target.merge_hdf(s1,ch_indexes=s1_ch_indexes)
    target.merge_hdf(s2)

Pull requests / Issues

If you need anything...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bci-dataset-1.0.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

bci_dataset-1.0.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file bci-dataset-1.0.0.tar.gz.

File metadata

  • Download URL: bci-dataset-1.0.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for bci-dataset-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ae2bb40ddad32bd50d086fe59dcbb79b0f5ae7236098a71a9b4df3f8e17d2bac
MD5 cfa5cfc7513e9709fa8cb9561dae8b86
BLAKE2b-256 874609f647c7cfb4f2edaba7cc27f579367957b6a741003a740cb6bd7bb65504

See more details on using hashes here.

File details

Details for the file bci_dataset-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bci_dataset-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for bci_dataset-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b6d91987022a9c17d0bf82c72ea4a296fff4bb6d7a991a1907ea12644b95464
MD5 f9505a4577c2ab39a8c569bcd125bc83
BLAKE2b-256 923bf090a01733627b99edd22119b66cb3b9a5894ee7107ddea51f206beaff70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page