Building HDF datasets for machine learning.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

bci-dataset

Python library for organizing multiple EEG datasets using HDF.
Support EEGLAB Data!

*For do deep learning, this library was created as a tool to combine datasets for the major BCI paradigms.

Installation

pip install bci-dataset

How to Use

Add EEG Data

Supported Formats

EEGLAB(.set)
- Epoching (epoch splitting) on EEGLAB is required.
numpy(ndarray)

Commonality

import bci_dataset

fpath = "./dataset.hdf"
fs = 500 # sampling rate
updater = DatasetUpdater(fpath,fs=fs)
updater.remove_hdf() # delete hdf file that already exist

Add EEGLAB Data

import numpy as np

labels = ["left","right"]
eeglab_list = ["./sample.set"] # path list of eeglab files

# add eeglab(.set) files
for fp in eeglab_list:
    updater.add_eeglab(fp,labels)

Add NumPy Data

#dummy
dummy_data = np.ones((12,6000)) # channel × signal
dummy_indexes = [0,1000,2000,3000,4000,5000] #Index of trial start
dummy_labels = ["left","right"]*3 #Label of trials
dummy_size = 500 #Size of 1 trial

updater.add_numpy(dummy_data,dummy_indexes,dummy_labels,dummy_size)

Apply Preprocessing

If the "preprocess" method is executed again with the same group name, the already created group with the specified name is deleted once before preprocessing.

"""
preprocessing example
bx : ch × samples
"""
def prepro_func(bx:np.ndarray): 
    x = bx[12:15,:]
    return StandardScaler().fit_transform(x.T).T
updater.preprocess("custom",prepro_func)

Contents of HDF

Note that "dataset" in the figure below refers to the HDF dataset (class).

hdf file
├ origin : group / raw data
│ ├ 1 : dataset
│ ├ 2 : dataset
│ ├ 3 : dataset
│ ├ 4 : dataset
│ ├ 5 : dataset
│ └ …
└ prepro : group / data after preprocessing
　 ├ custom : group / "custom" is any group name
　 │ ├ 1 : dataset
　 │ ├ 2 : dataset
　 │ ├ 3 : dataset
　 │ ├ 4 : dataset
　 │ ├ 5 : dataset
　 │ └ …
　 └ custom2 : group
　 　 └ ...omit (1,2,3,4,…)

Check the contents with software such as HDFView.

Use "h5py" or similar to read the HDF file.

import h5py
with h5py.File(fpath) as h5:
    fs = h5["prepro/custom"].attrs["fs"]
    dataset_size = h5["prepro/custom"].attrs["count"]
    dataset79 = h5["prepro/custom/79"][()] #ch × samples
    dataset79_label = h5["prepro/custom/79"].attrs["label"]

Merge Dataset

In order to merge, "dataset_name" must be set.
If the order of channels is different for each dataset, the order can be aligned by specifying ch_indexes.

Source's preprocessing group is not inherited. In other words, preprocess() must be executed after the merge.

Example: Merge source1 and source2 datasets

    target = DatasetUpdater("new_dataset.h5",fs=fs)
    target.remove_hdf() # reset hdf
    s1 = DatasetUpdater("source1.h5",fs=fs,dataset_name="source1")
    s2 = DatasetUpdater("source2.h5",fs=fs,dataset_name="source2")
    s1_ch_indexes = [1,60,10,5]# channel indexes to use
    target.merge_hdf(s1,ch_indexes=s1_ch_indexes)
    target.merge_hdf(s2)

Pull requests / Issues

If you need anything...

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.0

Oct 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bci-dataset-1.0.0.tar.gz (6.1 kB view hashes)

Uploaded Oct 26, 2023 Source

Built Distribution

bci_dataset-1.0.0-py3-none-any.whl (7.7 kB view hashes)

Uploaded Oct 26, 2023 Python 3

Hashes for bci-dataset-1.0.0.tar.gz

Hashes for bci-dataset-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ae2bb40ddad32bd50d086fe59dcbb79b0f5ae7236098a71a9b4df3f8e17d2bac`
MD5	`cfa5cfc7513e9709fa8cb9561dae8b86`
BLAKE2b-256	`874609f647c7cfb4f2edaba7cc27f579367957b6a741003a740cb6bd7bb65504`

Hashes for bci_dataset-1.0.0-py3-none-any.whl

Hashes for bci_dataset-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b6d91987022a9c17d0bf82c72ea4a296fff4bb6d7a991a1907ea12644b95464`
MD5	`f9505a4577c2ab39a8c569bcd125bc83`
BLAKE2b-256	`923bf090a01733627b99edd22119b66cb3b9a5894ee7107ddea51f206beaff70`